-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Description
from iamsystem import Matcher
matcher = Matcher.build(
keywords=["cancer"]
)
text = "cancer cancer"
annots = matcher.annot_text(text=text)
for annot in annots:
print(annot)
# cancer 0 6 cancerIt outputs a single annotation although the word 'cancer' is repeated twice. This behavior was explained in a comment in the code:
| # Don't create multiple annotations for the same transition |
Don't create multiple annotations for the same transition. For example 'cancer cancer' with keyword 'cancer': if an annotation was created for the first 'cancer' occurrence, don't create a new one for the second occurrence.
The rationale was to avoid the creation of two annotations for repeated words when the window is large:
from iamsystem import Matcher
matcher = Matcher.build(
keywords=["cancer de prostate"],
w=20
)
text = "cancer de prostate token token token token prostate"
annots = matcher.annot_text(text=text)
for annot in annots:
print(annot)
# cancer de prostate 0 18 cancer de prostateHowever, this is not appropriate for all use cases and is not the behavior a user expects; therefore multiple sequences of words that match a keyword should be annotated several times by default.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels