-
Notifications
You must be signed in to change notification settings - Fork 11
Description
In the frus-history index (see the source TEI), range-based cross references are encoded in a unique way, which hsg-shell's ODD isn't parsing the same way as our pre-TEI Publisher site, and seems to be causing server errors.
Here is a sample encoded cross reference:
<item>
<term>Aandahl, Fredrick</term>, <ref target="#range(b_446-start,b_446-end)"
>196–199</ref>, <ref target="#b_447">203</ref>, <ref target="#b_448"
>206</ref>
</item>The syntax used in the first of these two @target attributes is based on the TEI Guidelines' support for XPointer; I only use the range pointer scheme. Specifically, the cross reference points to the range between two <anchor> elements with @xml:id elements in the body of the book:
- Line 11548
<anchor xml:id="b_446-start" corresp="#b_446-end"/>
- Line 11711
<anchor xml:id="b_446-end" corresp="#b_446-start"/>
My original handling for this, on our pre-TEI Publisher-based website, was to examine where the targets were located, and replace the book's original "196–199, 203, 206" with a web-relevant description of the target section, e.g., "Ch. 8 paras 34–39, Ch. 8 para 47, Ch. 8 para 52".
The Internet Archive contains a snapshot of the old rendering of the page.
"Ch. 8 paras 34–39, Ch. 8 para 47, Ch. 8 para 52" were given the URLs:
- https://history.state.gov/historicaldocuments/frus-history/chapter-8#b_446-start
- https://history.state.gov/historicaldocuments/frus-history/chapter-8#b_447
- https://history.state.gov/historicaldocuments/frus-history/chapter-8#b_448
However, the current hsg site fails to parse the links correctly, generating URLs like this:
- https://history.state.gov/historicaldocuments/frus-history/range(b_446-start,b_446-end)
- https://history.state.gov/historicaldocuments/frus-history/b_447
- https://history.state.gov/historicaldocuments/frus-history/b_448
Our website performs a 302 redirect when these URLs, respectively, to:
- https://history.state.gov/historicaldocuments/frus-history/chapter-8#b_446-start
- https://history.state.gov/historicaldocuments/frus-history/chapter-8#b_447
- https://history.state.gov/historicaldocuments/frus-history/chapter-8#b_448
... which appears to be a graceful recovery, but @windauer reported finding errors in the logs:
2019-12-20 10:40:09,297 [qtp731870416-10326] ERROR (DeferredFunctionCall.java [isEmpty]:203) - Exception in deferred function: not-found publication frus-history-monograph document frus-history section b_806 not found [at line 99, column 13, source: /db/apps/hsg-shell/modules/pages.xqm]
In function:
pages:load-fallback-page(xs:string, xs:string, xs:string?) [85:13:/db/apps/hsg-shell/modules/pages.xqm]
pages:load-xml(xs:string, xs:string, xs:string?, xs:string, xs:boolean?) [49:67:/db/apps/hsg-shell/modules/pages.xqm]
pages:load(node(), map(*), xs:string?, xs:string?, xs:string?, xs:string, xs:boolean) [-1:-1:/db/apps/hsg-shell/modules/pages.xqm]
templates:process-output(element(), map(*), item()*, element())
....
This error comes ~ 10 x time in a row followed by:
2019-12-20 10:40:09,300 [qtp731870416-10326] WARN (HttpChannel.java [handleException]:591) - /exist/apps/hsg-shell/historicaldocuments/frus-history/b_806
javax.servlet.ServletException: javax.servlet.ServletException: An error occurred while processing request to /exist/apps/hsg-shell/historicaldocuments/frus-history/b_806: Committed
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:162) ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
...
... 18 more
Here is the original code I wrote to transform the links:
(: handle xpointer-style range references, as found in the frus-history, e.g.,
index entries like:
<term>Washington, George</term>, <ref target="#range(b_37-start,b_37-end)">9–10</ref>
point to:
<anchor xml:id="b_37-start" corresp="#b_37-end"/>
and:
<anchor xml:id="b_37-end" corresp="#b_37-start"/>
:)
else if (starts-with($target, '#range')) then
let $range := substring-after($target, '(')
let $range := substring-before($range, ')')
let $range := tokenize($range, ',')
let $range-start := $range[1]
let $range-end := $range[2]
let $target-start-node := root($node)/id($range-start)
let $target-end-node := root($node)/id($range-end)
(: use ancestor notes to ensure linkability :)
let $target-start-node := if ($target-start-node/ancestor::tei:note) then $target-start-node/ancestor::tei:note else $target-start-node
let $target-end-node := if ($target-end-node/ancestor::tei:note) then $target-end-node/ancestor::tei:note else $target-end-node
let $target-start-node-ancestor-div := $target-start-node/ancestor::tei:div[1]
let $target-end-node-ancestor-div := $target-end-node/ancestor::tei:div[1]
let $same-ancestor-divs := $target-start-node-ancestor-div is $target-end-node-ancestor-div
(: use the ancestor chapter div's heading, e.g., "Chapter 9: ...", but chop off at the colon :)
let $target-nodes := ($target-start-node, $target-end-node)
let $target-divs := ($target-start-node-ancestor-div, $target-end-node-ancestor-div)
let $target-node-labels :=
let $both-notes := $target-nodes[1]/self::tei:note and $target-nodes[2]/self::tei:note
let $one-note := $target-nodes[1]/self::tei:note or $target-nodes[2]/self::tei:note
for $target-node at $n in $target-nodes
let $ancestor-div-label :=
if ($same-ancestor-divs and $n = 2) then
()
else
string-join(functx:remove-elements-deep($target-divs[$n]/tei:head[1], 'note'), '')
let $ancestor-div-label :=
if (contains($ancestor-div-label, ':')) then substring-before($ancestor-div-label, ':') else $ancestor-div-label
let $node-label :=
if ($target-node/self::tei:note) then
concat(if ($n = 1 and $both-notes) then 'footnotes ' else 'footnote ', $target-node/@n)
else
(: paragraph-like-block-number :)
concat(if ($one-note) then 'para ' else if ($n = 1) then 'paras ' else '', index-of($target-start-node-ancestor-div/*[not(self::tei:head)][not(self::tei:byline)][not(self::tei:p[@rend='sectiontitlebold'])], $target-node/ancestor::element()[parent::tei:div][1]))
return
string-join(($ancestor-div-label, $node-label), ' ')
let $label :=
replace(string-join($target-node-labels, '–'), 'Chapter', 'Ch.')
let $target-node-destination-hash :=
if ($target-start-node/self::tei:note) then
concat('#fnref', substring-after($target-start-node/@xml:id, 'fn'))
else
concat('#', $range-start)
return
(: check to make sure the targets exist :)
if ($target-start-node and $target-end-node) then
element a {
attribute href { concat($abs-site-uri, $volume, '/', $target-start-node-ancestor-div/@xml:id, $target-node-destination-hash, $persistent-view) },
$label
}
(: display the label in case of malformed links :)
else
$label
(: handle single point references, as found in the frus-history, e.g.,
index entries like:
<term>Woodford, Stewart</term>, <ref target="#b_803">98</ref>
point to:
<anchor xml:id="b_611"/>
:)
else if (starts-with($target, '#b')) then
let $url := substring-after($target, '#')
let $target-node := root($node)/id($url)
let $target-node := if ($target-node/ancestor::tei:note) then $target-node/ancestor::tei:note else $target-node
let $destination-div := $target-node/ancestor::tei:div[1]
(: use the ancestor chapter div's heading, e.g., "Chapter 9: ...", but chop off at the colon :)
let $head := string-join(functx:remove-elements-deep($destination-div/tei:head[1], 'note'), '')
let $target-node-label :=
if ($target-node/self::tei:note) then
concat('footnote ', $target-node/@n)
else
concat('para ', index-of($destination-div/*[not(self::tei:head)][not(self::tei:byline)][not(self::tei:p[@rend='sectiontitlebold'])], $target-node/ancestor::element()[parent::tei:div][1]))
let $label := replace(concat(if (contains($head, ':')) then substring-before($head, ':') else $head, ' ', $target-node-label), 'Chapter', 'Ch.')
let $target-node-destination-hash :=
if ($target-node/self::tei:note) then
concat('#fnref', substring-after($target-node/@xml:id, 'fn'))
else
$target
return
if ($target-node) then
element a {
attribute href { concat($abs-site-uri, $volume, '/', $destination-div/@xml:id, $target-node-destination-hash, $persistent-view) },
$label
}
(: display the label in case of malformed links :)
else
$label
else
element a {
attribute href { concat($abs-site-uri, $volume, '/', substring-after($target, '#'), $persistent-view) },
$type,
render:recurse($node, $options)
}We should research the logs to find the source of the error messages above, and, if needed, adapt the original link parsing code to our current ODD-based method for transforming TEI into HTML.