Possible tokenization bug that affect answers accuracy.

**Describe the bug**
When running the **Tutorial - 3** with multiple models and asking a question **"Who killed Ned Stark?"** the first result is almost in all models is 
**1st Answer**  `"n. When Ned\'s father and brother went south to reclaim her, the "Mad King" Aerys Targaryen burned both of them alive. Ned and Robert Baratheon led the"`

Some of the 2nd and 3rd answers are actually correct (especially in `deepset/minilm-uncased-squad2` and `deepset/electra-base-squad2` ), but the very top answer is factually not correct. It's a shame because the rest of the answers are good. 

I think that the reason could be a bad tokenization of some cases. In this case the text reads `Ned\'s father and brother` and maybe the algorithm thinks that Ned is one of the characters in the this very story that later is burned alive when in reality it's **Ned's**  farther and brother are burned.

How can we remove this `\` from the text so that the model starts understanding the text better?

**Error message**
Error that was thrown (if available)

**Expected behavior**
A clear and concise description of what you expected to happen.

**Additional context**
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

**To Reproduce**
Steps to reproduce the behavior

**System:**
 - OS: Colab Notebook
 - GPU/CPU:
 - Haystack version (commit or version number): 0.4.0
 - DocumentStore:
 - Reader:
 - Retriever:


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible tokenization bug that affect answers accuracy. #479

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible tokenization bug that affect answers accuracy. #479

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions