Skip to content

Porting to Python... #2

@dddjjjbbb

Description

@dddjjjbbb

Hey @Syniurge,

I've been trying to port over the extraction logic from your script to Python but am having a few issues understanding Sony's logic. Perhaps you could help me out? Sorry to drag you back to code that's over a decade old :)

Let's say I have a single .annot file with one fragment:

<fragment start="OEBPS/farrokhzad_let_us_believe_text-10.xhtml#point(/1/4/2/18/1:15)" end="OEBPS/farrokhzad_let_us_believe_text-10.xhtml#point(/1/4/2/20/1:42)"/>

Once I extract the start and end attributes I have this:

[Annotation(start=Point(filename='OEBPS/farrokhzad_let_us_believe_text-10.xhtml', node_indexes=[4, 2, 18, 1], byte=15), end=Point(filename='OEBPS/farrokhzad_let_us_believe_text-10.xhtml', node_indexes=[4, 2, 20, 1], byte=42))]

First question, does the regex extraction seem correct? Are those the values you'd expect for those particular node_indexes?

Second question, can you explain how the node_indexes and byte values work in the context of parsing the corresponding xhtml?

It's not totally clear to me. If it's easier and you have access, perhaps you could point me to some documentation?

Any help would be greatly appreciated :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions