Question: possible to retrieve untokenized sentences?

May sound silly, but would it be possible to create a method that would allow retrieving sentences from the tokenizer without whitespace between punctuation marks (e.g. untokenized)? E.g. maybe providing a tuple that would hold two versions of a sentence, both the tokenized, as well as the original?

It is practical to keep the untokenized sentence in some scenarios (e.g. showing them to end users), and reconstructing it by script would be rather hacky and imprecise I guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: possible to retrieve untokenized sentences? #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question: possible to retrieve untokenized sentences? #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions