Skip to content

Bug/augmentation output differs from input file#734

Merged
ArshaanNazir merged 4 commits intorelease/1.4.0from
bug/augmentation-output-differs-from-input-file
Aug 31, 2023
Merged

Bug/augmentation output differs from input file#734
ArshaanNazir merged 4 commits intorelease/1.4.0from
bug/augmentation-output-differs-from-input-file

Conversation

@ArshaanNazir
Copy link
Copy Markdown
Contributor

@ArshaanNazir ArshaanNazir commented Aug 30, 2023

Description

This PR fixes augmentation and also for swap_entities ( takes care of sentences having I-labels only without the B-tag)

We fixed an issue in the data augmentation process where augmented files format differed from input files, leading to inconsistencies that was negatively impacting model training and evaluation.

Expected Output

-DOCSTART- -X- -X- O

CRICKET NNP B-NP O
- : O O
LEICESTERSHIRE NNP B-NP B-ORG
TAKE NNP I-NP O
OVER IN B-PP O
AT NNP B-NP O
TOP NNP I-NP O
AFTER NNP I-NP O
INNINGS NNP I-NP O
VICTORY NN I-NP O
. . O O

Actual Output

-DOCSTART- -X- -X- O

CRICKET -X- -X- O
- -X- -X- O
LEICESTERSHIRE -X- -X- B-ORG
TAKE -X- -X- O
OVER -X- -X- O
AT -X- -X- O
TOP -X- -X- O
AFTER -X- -X-O
INNINGS -X- -X- O
VICTORY -X- -X- O
. -X- -X- O

Swap-entities issue:
image

@ArshaanNazir ArshaanNazir linked an issue Aug 30, 2023 that may be closed by this pull request
@ArshaanNazir ArshaanNazir added 🐛 Bug Something isn't working 💡Enhancements Something can be improved labels Aug 30, 2023
Copy link
Copy Markdown
Collaborator

@chakravarthik27 chakravarthik27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArshaanNazir ArshaanNazir merged commit b86680a into release/1.4.0 Aug 31, 2023
@ArshaanNazir ArshaanNazir deleted the bug/augmentation-output-differs-from-input-file branch September 6, 2023 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐛 Bug Something isn't working 💡Enhancements Something can be improved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Augmentation Output Differs from Input File

2 participants