-
Notifications
You must be signed in to change notification settings - Fork 145
DE TN Fixes #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DE TN Fixes #177
Conversation
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
…git strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
…00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
…ng with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
for more information, see https://pre-commit.ci
|
@zoobereq please update grammars path in Jenkins to re-built CI grammars https://github.com/NVIDIA/NeMo-text-processing/blob/main/Jenkinsfile#L15 |
for more information, see https://pre-commit.ci
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
nemo_text_processing/text_normalization/de/taggers/tokenize_and_classify.py
Show resolved
Hide resolved
nemo_text_processing/text_normalization/de/verbalizers/electronic.py
Outdated
Show resolved
Hide resolved
| w w w punkt a m a z o n punkt com punkt de .~www.amazon.com.de. | ||
| h t t p s doppelpunkt slash slash w w w punkt a b c punkt com slash a b fragezeichen gleichheitszeichen drei bindestrich slash a b s slash eins~https://www.abc.com/ab?=3-/abs/1 | ||
| at z u c k~@zuck | ||
| at z o o b e r e q~@zoobereq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use your name as an example. It leads to potential doxing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - thank you. Fixed.
tests/nemo_text_processing/de/test_sparrowhawk_inverse_text_normalization.sh
Show resolved
Hide resolved
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
tbartley94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove references to yourself and competitors and LGTM
tests/nemo_text_processing/de/data_text_normalization/test_cases_electronic.txt
Outdated
Show resolved
Hide resolved
tests/nemo_text_processing/de/data_text_normalization/test_cases_electronic.txt
Outdated
Show resolved
Hide resolved
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
|
@zoobereq LGTM, will approve once Evelina is happy |
ekmb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com>
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com>
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com>
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com>
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes #166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com>
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes NVIDIA#166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com>
* Adds support for social media tags (e.g. @zoobereq) Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Adds test cases for social media tags Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes pathing for Sparrowhawk Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue of the DE normalizer not accepting comma-separated digit strings Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the normalizer didn't accept time formatted as 00.00 Uhr or 0.00 Uhr Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes the issue where the the sentence-final period in sentences ending with a domain name would be tagged as part of that domain name Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removes unused imports Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes the formatting Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes NVIDIA#166 for DE Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updates grammar paths Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Minor Fixes Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Fixes test cases Signed-off-by: Simon Zuberek <szuberek@nvidia.com> --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
This PR implements DE TN fixes for the following issues:
@zoobereqand@zoobereq.net)2.30and02.30)This PR does not address the following:
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.