Porting to Python...

Hey @Syniurge,

I've been trying to port over the extraction logic from your script to Python but am having a few issues understanding Sony's logic. Perhaps you could help me out? Sorry to drag you back to code that's over a decade old :)

Let's say I have a single `.annot` file with one fragment:

`<fragment start="OEBPS/farrokhzad_let_us_believe_text-10.xhtml#point(/1/4/2/18/1:15)" end="OEBPS/farrokhzad_let_us_believe_text-10.xhtml#point(/1/4/2/20/1:42)"/>`

Once I extract the start and end attributes I have this:

`[Annotation(start=Point(filename='OEBPS/farrokhzad_let_us_believe_text-10.xhtml', node_indexes=[4, 2, 18, 1], byte=15), end=Point(filename='OEBPS/farrokhzad_let_us_believe_text-10.xhtml', node_indexes=[4, 2, 20, 1], byte=42))]`

First question, does the regex extraction seem correct? Are those the values you'd expect for those particular node_indexes?

Second question, can you explain how the `node_indexes` and `byte` values work in the context of parsing the corresponding xhtml?

It's not totally clear to me. If it's easier and you have access, perhaps you could point me to some documentation?

Any help would be greatly appreciated :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting to Python... #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Porting to Python... #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions