Bare bones add#23
Conversation
bertsky
left a comment
There was a problem hiding this comment.
I am willing to help make ocrd-import more versatile, but don't fully understand your use-case (esp. the P_ change).
| function debug { ocrd log -n ocrd-import debug "$1"; } | ||
| function critical { echo critical "$1"; } |
There was a problem hiding this comment.
If the OCR-D logging facility is too inefficient, I suggest to use a proper mechanism to override it with an additional switch. For example, by aliasing ocrd log -n ocrd-import or echo to a common log backend.
There was a problem hiding this comment.
Yes, this was just a quick hack to reduce the overhead of calling ocrd log.
For example, by aliasing
ocrd log -n ocrd-importorechoto a commonlogbackend.
How do you mean, aliasing?
There was a problem hiding this comment.
How do you mean, aliasing?
I meant
alias log="ocrd log -n ocrd-import"
...
((fastlog)) && alias log=echo
...
function debug { log debug "$1"; }
...
function critical { log critical "$1"; }There was a problem hiding this comment.
🎉 that is a nice solution and configurable too.
| set -e | ||
| trap rollback ERR | ||
| page=p${zeros:0:$((4-${#num}))}$num | ||
| echo "PAGE=$page" |
There was a problem hiding this comment.
I think this was just debugging by @stefanCCS, can go without replacement.
| if ! [[ ${base:0:1} =~ [a-zA-Z] ]]; then | ||
| base=f${base} | ||
| #base=P_${base} | ||
| a=0 # just do something to have a correct syntax for this 'if' |
There was a problem hiding this comment.
non-XS-compliant page names are a problem regardless of numpageid or not. I don't see why this branch should be abandoned. (But if you do, : is the sh word for no-op.)
There was a problem hiding this comment.
Not sure what the intention was here. @stefanCCS ?
| done | ||
| #case "$mimetype" in | ||
| # ${MIMETYPE_PAGE}) |
There was a problem hiding this comment.
If you want to have a version with next to no checks, the right approach would be to just add a switch to circumvent the no-clash check for resulting pages/files. Everything else is already fast (like the PAGE vs ALTO check for .xml) or can be deactivated (like the --no-convert switch).
There was a problem hiding this comment.
Yes, this was very broad commenting out. The "offending" call that slows down import is
clashes=($(ocrd workspace find -i "$base" -k local_filename -k mimetype -k pageId))|
#34 includes a Pythonic version of ocrd-import that (among many other improvements) handles this via |
These changes are not meant to be taken over as-is, they are just adaptions to make
ocrd-importmake_file_id-comptaible IDsJust wanted to make a draft PR as a discussion basis.