Skip to content

Configuration

Nico Kutscherauer edited this page Sep 22, 2024 · 3 revisions

The parser config is an XDM map from type map(xs:QName, item()*). Each available feature is detected by a specific qualified name. The namespace is alway http://www.nkutsche.com/xmlml/parser/features/ (prefix mlpf). Any feature name is also available as global variable with the final visibility. Thereby the variable name of the feature mlpf:EXAMPLE-FEATURE is mlml:EXAMPLE-FEATURE.

Default Config

Some parser features are relative to features of Xerces which is regularly used as the underlying XML parser by Saxon. If there is no custom configuration the XmlML parser will try to detect the configuration of the underlying Xerces for these features. The main goal is to achieve similar parsing results.

Features

mlml:STRIP-WHITESPACE

  • QName: mlpf:STRIP-WHITESPACE ($mlml:STRIP-WHITESPACE)
  • Type: xs:string
  • Allowed values: all, ignorable, none
  • Default: detected by underlying Xerces configuration

Configures if text nodes which contains only whitespace characters (x9, x10, x13, x20) should be treated as regular text nodes (<text>) or ignorable whitespace (<ws>). Note that text nodes which are descendants of an element which are marked with @xml:space='preserve' are never treated as ignorable whitespace.

Meaning of the values:

  • all → any whitespace node are treated as ignorable whitespace
  • ignoreable → only whitespace nodes that are children of an element that has a DTD content model that prevents text content are treated as ignoreable whitespace.
  • none → no whitespace node are treated as ignorable whitespace

mlml:RESOLVE-DTD-URIS

  • QName: mlpf:RESOLVE-DTD-URIS ($mlml:RESOLVE-DTD-URIS)
  • Type: xs:boolean
  • Default: detected by underlying Xerces configuration

NOT IMPLEMENTED YET.

mlml:EXPAND-DEFAULT-ATTRIBUTES

  • QName: mlpf:EXPAND-DEFAULT-ATTRIBUTES ($mlml:EXPAND-DEFAULT-ATTRIBUTES)
  • Type: xs:boolean
  • Default: detected by underlying Xerces configuration

NOT IMPLEMENTED YET.

mlml:URI_RESOLVER

  • QName: mlpf:URI_RESOLVER ($mlml:URI_RESOLVER)
  • Type: function($href as xs:string, $baseUri as xs:string) as map(xs:string, xs:string)?
  • Default: build in URI resolver based on the XPath function unparsed-text()

The URI resolver must be a function which expects two arguments from type xs:string. The first argument ($href) is the URI which has to be resolved. The second argument ($baseUri) is used as base URI to resolve $href if it is relative.

Return value must describe the resource which should be assigned to the requested URI. If the return value is not an empty sequence it must be a map with the following fields:

Key Description Required
base-uri The new base URI of the returned resource Yes
content The string content of the returned resource Yes
mediatype The media type of the returned resource NO
linefeed The line feed format. Posible values are: 'n', 'r', 'rn' NO

If the result of the URI resolver is an empty sequence, the build in URI resolver is used to resolve the URI request.

mlml:ENTITY_RESOLVER

  • QName: mlpf:ENTITY_RESOLVER ($mlml:ENTITY_RESOLVER)
  • Type: function($publicId as xs:string?, $systemId as xs:string?) as map(xs:string, xs:string)?
  • Default: #unset

The entity resolver must be a function which expects two arguments from type xs:string?. The first argument ($publicId) is used as public identifier, the second argument ($systemId) is used as system ID. The entity works similar to the URI resolver with the following differences:

  • The entity resolver is called only for external entity references.
  • The enitty resolver is called at first. If it returns an empty sequence the URI resolver is called.

mlml:IGNORE-INLINE-DTD-PIS

  • QName: mlpf:IGNORE-INLINE-DTD-PIS ($mlml:IGNORE-INLINE-DTD-PIS)
  • Type: xs:boolean
  • Default: detected by underlying Xerces configuration

It is allowed to insert processing instructions into the internal subset of a doctype declartion.

<!DOCTYPE root [
<?inline-dtd-pi?>
]>

If the value of the feature $mlml:IGNORE-INLINE-DTD-PIS is true these PIs are ignored. Otherwise they are recognized as children of the root node.

mlml:CUSTOM-STRUCTUR-ELEMENTS

  • QName: mlpf:CUSTOM-STRUCTUR-ELEMENTS ($mlml:CUSTOM-STRUCTUR-ELEMENTS)
  • Type: function($result as element(mlml:document)) as element()*
  • Default: #unset

If set the value must be a function which expects a provisional result of the parsing process as first argument. The function may return additional XmlML elements who's content model should be treated as structured (not mixed-content). This feature is used to overwrite the default detection of whitespace stripping (see mlml:STRIP-WHITESPACE).

Note: the returned XmlML elements must be the origin instances of the elements. Copy of elements as results are ignored.

mlml:IGNORE-EXTERNAL-DTD

  • QName: mlpf:IGNORE-EXTERNAL-DTD ($mlml:IGNORE-EXTERNAL-DTD)
  • Type: xs:boolean
  • Default: false

Ignores the reference to an external DTD in the Doctype declaration.

mlml:IGNORE-INLINE-DTD

  • QName: mlpf:IGNORE-INLINE-DTD ($mlml:IGNORE-INLINE-DTD)
  • Type: xs:boolean
  • Default: false

Ignores the internal subset of a Doctype declaration.

mlml:IGNORE-UNDECLARED-ENTITIES

  • QName: mlpf:IGNORE-UNDECLARED-ENTITIES ($mlml:IGNORE-UNDECLARED-ENTITIES)
  • Type: xs:boolean
  • Default: false

If false named entities with not available declarations are causing a parsing error. If true the parsing error is not thrown and an empty string is used as value of the entity.

mlml:PARSER-LOG-LEVEL

  • QName: mlpf:PARSER-LOG-LEVEL ($mlml:PARSER-LOG-LEVEL)
  • Type: xs:string
  • Allowed values (in breakets corresponding global variables):
    • VERBOSE ($mlml:LOG-LEVEL-VERBOSE)
    • DEBUG ($mlml:LOG-LEVEL-DEBUG)
    • WARNING ($mlml:LOG-LEVEL-WARN)
    • ERROR ($mlml:LOG-LEVEL-ERROR)

Log level of the parser log messages.

Clone this wiki locally