Some documents contain no space between a name and one of the following characters: []{},.<>. It makes sense to add an option that would recognize such characters as a token separator.
Additional thing that happens quite often are names like <i>Aus bus</i> Linn. It would be good to ignore <i> and </i>, or even use them as indicators of a canonical form of scientific names.
See also
#150
#53
Some documents contain no space between a name and one of the following characters:
[]{},.<>. It makes sense to add an option that would recognize such characters as a token separator.Additional thing that happens quite often are names like
<i>Aus bus</i> Linn. It would be good to ignore<i>and</i>, or even use them as indicators of a canonical form of scientific names.See also
#150
#53