Skip to content

Behavior of XDocumentType/XmlDocumentType.InternalSubset parsing is inconsistent to manual creation #44866

@krwq

Description

@krwq

As can be seen below when manually passing null to the constructor/CreateDocumentType as internal subset it prints nothing when serializing and reports null in the property but when parsing it prints extra when serialized [] (empty internal subset) and doesn't serialize to the same as input and property reports string.Empty.

See following example:

using System;
using System.Xml.Linq;

XDocumentType dtd1 = new XDocumentType("foo", null, null, null);
Console.WriteLine(dtd1); // Prints: <!DOCTYPE foo >
Console.WriteLine(dtd1.InternalSubset ?? "<null>"); // Prints: "<null>"

XDocumentType dtd2 = new XDocumentType("foo", null, null, string.Empty);
Console.WriteLine(dtd2); // Prints: <!DOCTYPE foo []>
Console.WriteLine(dtd2.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XDocument xd1 = XDocument.Parse("<!DOCTYPE foo ><foo/>");
Console.WriteLine(xd1.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)
Console.WriteLine(xd1.DocumentType.ToString()); // Prints: <!DOCTYPE foo []> (differs from input)

XDocument xd2 = XDocument.Parse("<!DOCTYPE foo []><foo/>");
Console.WriteLine(xd2.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)
Console.WriteLine(xd2.DocumentType.ToString()); // Prints: <!DOCTYPE foo []>

and similar for XmlDocument

using System;
using System.Xml;

XmlDocument xmldoc3 = new XmlDocument();
XmlDocumentType xdocType1 = xmldoc3.CreateDocumentType("foo", null, null, null);
Console.WriteLine(xdocType1.OuterXml); // Prints: <!DOCTYPE foo>
Console.WriteLine(xdocType1.InternalSubset ?? "<null>"); // Prints: "<null>"

XmlDocumentType xdocType2 = xmldoc3.CreateDocumentType("foo", null, null, string.Empty);
Console.WriteLine(xdocType2.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xdocType2.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XmlDocument xmldoc1 = new XmlDocument();
xmldoc1.LoadXml("<!DOCTYPE foo ><foo/>");
Console.WriteLine(xmldoc1.DocumentType.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xmldoc1.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

XmlDocument xmldoc2 = new XmlDocument();
xmldoc2.LoadXml("<!DOCTYPE foo []><foo/>");
Console.WriteLine(xmldoc2.DocumentType.OuterXml); // Prints: <!DOCTYPE foo[]>
Console.WriteLine(xmldoc2.DocumentType.InternalSubset ?? "<null>"); // Prints nothing (string.Empty)

For XmlDocument the docs mention that "The DTD internal subset on the DOCTYPE. If there is no DTD internal subset, String.Empty is returned.". But that does not seem to be true as seen in the first example.

Good thing they're at least consistent between each other.

Additionally recently suggested by the docs telling that XmlDocument.InternalSubset doesn't return null I changed XDocumentType to treat null and string.Empty the same (if passed null you would get same output as for the second example rather than null). I plan to revert this change as part of fix for #44489 - we can re-think the behavior in this issue.

My suggestion is to do one of the following (first and last option would fix consistency):

  • change both parsers and update doc to distinguish between null and string.Empty (third example would show null) - this would be a bit breaking (when parsing you might get null where you wouldn't expect it although rather rare scenario) but probably match with what people would expect - there is a clear distinction what null and empty means here
  • do not change anything (state before my change to XDocumentType), update the docs to reflect reality - not breaking but docs currently document something else and expectations might differ
  • make it treat null and string.Empty the same and always produce <!DOCTYPE foo[]> when serializing and return string.Empty - assuming users do not care about having extra [] in the doctype (cosmetic difference as they have no semantic meaning) this would make it slightly easier to use and at least be consistent between parsing and manually passing subset

cc: @buyaa-n @jozkee @stephentoub

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions