Skip to content

Adding the tokenizer contents to a FoLiA doc #14

@pirolen

Description

@pirolen

I wonder if there is a straightforward way to add from the tokenizer the sentences and their token content to build a new folia doc.
It is not clear to me how to do that with the add method: is one supposed to recursively access sentences and tokens from the tokenizer that yields Token types, and subsequently render the token contents by scripting (e.g. accessing a token class and then specifying it for a folia.Word annotation), or is there an direct way to add the tokenizer content structure to the FoLiA doc?

Or is python-ucto not meant to be used for that, and one should rather first create a folia doc with untokenized content and run CLI ucto on it?

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions