Skip to content

Improve parsing of frus-history index cross references #352

@joewiz

Description

@joewiz

In the frus-history index (see the source TEI), range-based cross references are encoded in a unique way, which hsg-shell's ODD isn't parsing the same way as our pre-TEI Publisher site, and seems to be causing server errors.

Here is a sample encoded cross reference:

<item>
    <term>Aandahl, Fredrick</term>, <ref target="#range(b_446-start,b_446-end)"
        >196–199</ref>, <ref target="#b_447">203</ref>, <ref target="#b_448"
        >206</ref>
</item>

The syntax used in the first of these two @target attributes is based on the TEI Guidelines' support for XPointer; I only use the range pointer scheme. Specifically, the cross reference points to the range between two <anchor> elements with @xml:id elements in the body of the book:

  1. Line 11548
    <anchor xml:id="b_446-start" corresp="#b_446-end"/>
  2. Line 11711
    <anchor xml:id="b_446-end" corresp="#b_446-start"/>

My original handling for this, on our pre-TEI Publisher-based website, was to examine where the targets were located, and replace the book's original "196–199, 203, 206" with a web-relevant description of the target section, e.g., "Ch. 8 paras 34–39, Ch. 8 para 47, Ch. 8 para 52".

The Internet Archive contains a snapshot of the old rendering of the page.

"Ch. 8 paras 34–39, Ch. 8 para 47, Ch. 8 para 52" were given the URLs:

However, the current hsg site fails to parse the links correctly, generating URLs like this:

Our website performs a 302 redirect when these URLs, respectively, to:

... which appears to be a graceful recovery, but @windauer reported finding errors in the logs:

2019-12-20 10:40:09,297 [qtp731870416-10326] ERROR (DeferredFunctionCall.java [isEmpty]:203) - Exception in deferred function: not-found publication frus-history-monograph document frus-history section b_806 not found [at line 99, column 13, source: /db/apps/hsg-shell/modules/pages.xqm]
In function:
    pages:load-fallback-page(xs:string, xs:string, xs:string?) [85:13:/db/apps/hsg-shell/modules/pages.xqm]
    pages:load-xml(xs:string, xs:string, xs:string?, xs:string, xs:boolean?) [49:67:/db/apps/hsg-shell/modules/pages.xqm]
    pages:load(node(), map(*), xs:string?, xs:string?, xs:string?, xs:string, xs:boolean) [-1:-1:/db/apps/hsg-shell/modules/pages.xqm]
    templates:process-output(element(), map(*), item()*, element()) 
   ....

This error comes ~ 10 x time in a row followed by:

2019-12-20 10:40:09,300 [qtp731870416-10326] WARN  (HttpChannel.java [handleException]:591) - /exist/apps/hsg-shell/historicaldocuments/frus-history/b_806 
javax.servlet.ServletException: javax.servlet.ServletException: An error occurred while processing request to /exist/apps/hsg-shell/historicaldocuments/frus-history/b_806: Committed
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:162) ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
        ...
    ... 18 more

Here is the original code I wrote to transform the links:

(: handle xpointer-style range references, as found in the frus-history, e.g.,
    index entries like: 
        <term>Washington, George</term>, <ref target="#range(b_37-start,b_37-end)">9–10</ref>
    point to:
        <anchor xml:id="b_37-start" corresp="#b_37-end"/>
    and:
        <anchor xml:id="b_37-end" corresp="#b_37-start"/>
:)
else if (starts-with($target, '#range')) then
    let $range := substring-after($target, '(')
    let $range := substring-before($range, ')')
    let $range := tokenize($range, ',')
    let $range-start := $range[1]
    let $range-end := $range[2]
    let $target-start-node := root($node)/id($range-start)
    let $target-end-node := root($node)/id($range-end)
    (: use ancestor notes to ensure linkability :)
    let $target-start-node := if ($target-start-node/ancestor::tei:note) then $target-start-node/ancestor::tei:note else $target-start-node
    let $target-end-node := if ($target-end-node/ancestor::tei:note) then $target-end-node/ancestor::tei:note else $target-end-node
    let $target-start-node-ancestor-div := $target-start-node/ancestor::tei:div[1]
    let $target-end-node-ancestor-div := $target-end-node/ancestor::tei:div[1]
    let $same-ancestor-divs := $target-start-node-ancestor-div is $target-end-node-ancestor-div
    (: use the ancestor chapter div's heading, e.g., "Chapter 9: ...", but chop off at the colon :)
    let $target-nodes := ($target-start-node, $target-end-node)
    let $target-divs := ($target-start-node-ancestor-div, $target-end-node-ancestor-div)
    let $target-node-labels := 
        let $both-notes := $target-nodes[1]/self::tei:note and $target-nodes[2]/self::tei:note
        let $one-note := $target-nodes[1]/self::tei:note or $target-nodes[2]/self::tei:note
        for $target-node at $n in $target-nodes
        let $ancestor-div-label :=
            if ($same-ancestor-divs and $n = 2) then
                ()
            else 
                string-join(functx:remove-elements-deep($target-divs[$n]/tei:head[1], 'note'), '')
        let $ancestor-div-label :=
            if (contains($ancestor-div-label, ':')) then substring-before($ancestor-div-label, ':') else $ancestor-div-label
        let $node-label :=
            if ($target-node/self::tei:note) then 
                concat(if ($n = 1 and $both-notes) then 'footnotes ' else 'footnote ', $target-node/@n)
            else
                (: paragraph-like-block-number :)
                concat(if ($one-note) then 'para ' else if ($n = 1) then 'paras ' else '', index-of($target-start-node-ancestor-div/*[not(self::tei:head)][not(self::tei:byline)][not(self::tei:p[@rend='sectiontitlebold'])], $target-node/ancestor::element()[parent::tei:div][1]))
        return
            string-join(($ancestor-div-label, $node-label), ' ')
    let $label :=
        replace(string-join($target-node-labels, '–'), 'Chapter', 'Ch.')
    let $target-node-destination-hash := 
        if ($target-start-node/self::tei:note) then
            concat('#fnref', substring-after($target-start-node/@xml:id, 'fn'))
        else
            concat('#', $range-start)
    return
        (: check to make sure the targets exist :)
        if ($target-start-node and $target-end-node) then
            element a { 
                attribute href { concat($abs-site-uri, $volume, '/', $target-start-node-ancestor-div/@xml:id, $target-node-destination-hash, $persistent-view) },
                $label 
                }
        (: display the label in case of malformed links :)
        else
            $label
(: handle single point references, as found in the frus-history, e.g.,
    index entries like:
     <term>Woodford, Stewart</term>, <ref target="#b_803">98</ref>
    point to:
     <anchor xml:id="b_611"/>
:)
else if (starts-with($target, '#b')) then
    let $url := substring-after($target, '#')
    let $target-node := root($node)/id($url)
    let $target-node := if ($target-node/ancestor::tei:note) then $target-node/ancestor::tei:note else $target-node
    let $destination-div := $target-node/ancestor::tei:div[1]
    (: use the ancestor chapter div's heading, e.g., "Chapter 9: ...", but chop off at the colon :)
    let $head := string-join(functx:remove-elements-deep($destination-div/tei:head[1], 'note'), '')
    let $target-node-label :=
        if ($target-node/self::tei:note) then 
            concat('footnote ', $target-node/@n)
        else
            concat('para ', index-of($destination-div/*[not(self::tei:head)][not(self::tei:byline)][not(self::tei:p[@rend='sectiontitlebold'])], $target-node/ancestor::element()[parent::tei:div][1]))
    let $label := replace(concat(if (contains($head, ':')) then substring-before($head, ':') else $head, ' ', $target-node-label), 'Chapter', 'Ch.')
    let $target-node-destination-hash := 
        if ($target-node/self::tei:note) then
            concat('#fnref', substring-after($target-node/@xml:id, 'fn'))
        else
            $target
    return
        if ($target-node) then 
            element a { 
                attribute href { concat($abs-site-uri, $volume, '/', $destination-div/@xml:id, $target-node-destination-hash, $persistent-view) },
                $label 
                }
        (: display the label in case of malformed links :)
        else 
            $label
else
    element a { 
        attribute href { concat($abs-site-uri, $volume, '/', substring-after($target, '#'), $persistent-view) }, 
        $type,
        render:recurse($node, $options) 
        }

We should research the logs to find the source of the error messages above, and, if needed, adapt the original link parsing code to our current ODD-based method for transforming TEI into HTML.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions