Add nlp_text_splitter package by hhuangMITRE · Pull Request #78 · openmpf/openmpf-python-component-sdk

hhuangMITRE · 2024-04-30T14:20:05Z

Issues:

Add NLP Text Splitter to Python Component SDK openmpf#1798

Related PRs:

This change is

…xt splitter test files.

jrobble

Reviewed 7 of 9 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @hhuangMITRE)

detection/nlp_text_splitter/install.sh line 1 at r2 (raw file):

#!/usr/bin/env bash

Needs header comments.

detection/nlp_text_splitter/install.sh line 7 at r2 (raw file):

main() {
    if ! options=$(getopt --name "$0"  \
            --options t:gm:w:b: \

Should this be --options t:gm:w:s: ?

Specifically, the b: should be replaced with s: since that's the short version of --install-spacy-model | -s.

detection/nlp_text_splitter/install.sh line 13 at r2 (raw file):

    fi
    eval set -- "$options"
    local models_dir=/opt/wtp/models

Call this wtp_models_dir for clarity.

detection/nlp_text_splitter/install.sh line 130 at r2 (raw file):

                                       same directory as this script)
    --gpu, -g:                         Install the GPU version of PyTorch
    --models-dir, -m <path>:           Path where WTP models will be stored.

Call this --wtp-models-dir for clarity. Update in the README as well.

hhuangMITRE

Reviewable status: 5 of 9 files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @jrobble)

detection/nlp_text_splitter/install.sh line 1 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Needs header comments.

Fixed, thank you! I've updated the headers in this file to 2024. I can update the other headers from 2023->2024 as well.

detection/nlp_text_splitter/install.sh line 7 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Should this be --options t:gm:w:s: ?

Specifically, the b: should be replaced with s: since that's the short version of --install-spacy-model | -s.

Yes, it should be s: instead of b:. I fixed it, thanks!

detection/nlp_text_splitter/install.sh line 13 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Call this wtp_models_dir for clarity.

Done, I've also updated the function call setup_wtp_models_dir.

detection/nlp_text_splitter/install.sh line 130 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Call this --wtp-models-dir for clarity. Update in the README as well.

Done.

detection/nlp_text_splitter/README.md line 40 at r4 (raw file):

  setup a PyTorch installation with CUDA (GPU) libraries.

- `--wtp-models-dir |-m <wtp-models-dir >`: Add this parameter to

Refactored to wtp-models-dir

jrobble · 2024-05-03T21:45:18Z

detection/nlp_text_splitter/install.sh line 34 at r5 (raw file):

    if ! options=$(getopt --name "$0"  \
            --options t:gm:w:s: \
            --longoptions text-splitter-dir:,gpu,wtp-models-dir :,install-wtp-model:,install-spacy-model: \

This space before the colon made it so that I could not do a GPU build no matter what I provided for BUILD_TYPE: wtp-models-dir :,install-wtp-model

I fixed this in a commit.

jrobble · 2024-05-04T04:43:14Z

detection/nlp_text_splitter/nlp_text_splitter/__init__.py line 105 at r6 (raw file):

            self._default_lang = default_lang

        if self._model_name == wtp_model_name:

Not only do you need to check if the model name is the same, you need to also check if the model setting is the same (CPU vs. GPU).

I ran into a situation where on first run with SENTENCE_MODEL_CPU_ONLY=false the component was Using cached model: wtp-bert-mini but I saw no GPU activity.

…mpf-python-component-sdk into feat/text-split-pkg

hhuangMITRE

Reviewable status: 4 of 9 files reviewed, 5 unresolved discussions (waiting on @brosenberg42 and @jrobble)

detection/nlp_text_splitter/nlp_text_splitter/__init__.py line 105 at r6 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Not only do you need to check if the model name is the same, you need to also check if the model setting is the same (CPU vs. GPU).

I ran into a situation where on first run with SENTENCE_MODEL_CPU_ONLY=false the component was Using cached model: wtp-bert-mini but I saw no GPU activity.

Thanks for catching that, I tested the canine series and didn't catch this. I've made the requested changes and rebuilt the images just to make sure. I've tested out the text-splitter with gpu servers: confirmed that the caching works when updating a cpu wtp-bert-mini to use cuda/gpu.

Confirmed that works, bert-mini used about 800 MB of GPU memory.

jrobble

Reviewed 3 of 4 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)

brosenberg42 and others added 4 commits April 26, 2024 11:19

Add nlp_text_splitter package

c0afbb1

Update install script to include models option. Update License and te…

eaa59f4

…xt splitter test files.

Cleanup install script and unit test.

5c9a07e

Updating documentation for text splitter utility.

f04d049

hhuangMITRE added the enhancement New feature or request label Apr 30, 2024

hhuangMITRE requested review from brosenberg42 and jrobble April 30, 2024 14:20

hhuangMITRE assigned brosenberg42 and hhuangMITRE Apr 30, 2024

hhuangMITRE changed the base branch from master to develop April 30, 2024 14:20

This was referenced Apr 30, 2024

Update to use modularized nlp_text_splitter package. openmpf/openmpf-components#364

Merged

Update to use modularized nlp_text_splitter package. openmpf/openmpf-docker#213

Merged

brosenberg42 and others added 3 commits May 1, 2024 11:20

Improve parameter parsing in install script

8638c86

Fix pluralization

41640ef

Improve README formatting.

1d84630

jrobble requested changes May 2, 2024

View reviewed changes

Updates to installation script, README. fix spacy model option.

b0350fc

hhuangMITRE requested a review from jrobble May 2, 2024 16:05

hhuangMITRE commented May 2, 2024

View reviewed changes

hhuangMITRE and others added 2 commits May 2, 2024 13:06

minor bugfix

ea5e488

Remove space.

220f3d0

hhuangMITRE added 2 commits May 6, 2024 03:12

Fix caching behavior for initialized models.

427e4b0

Merge branch 'feat/text-split-pkg' of https://github.com/openmpf/open…

5978dae

…mpf-python-component-sdk into feat/text-split-pkg

hhuangMITRE commented May 6, 2024

View reviewed changes

jrobble approved these changes May 7, 2024

View reviewed changes

hhuangMITRE merged commit 23be9c4 into develop May 7, 2024

hhuangMITRE deleted the feat/text-split-pkg branch May 7, 2024 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nlp_text_splitter package#78

Add nlp_text_splitter package#78
hhuangMITRE merged 12 commits into
developfrom
feat/text-split-pkg

hhuangMITRE commented Apr 30, 2024 •

edited

Loading

Uh oh!

jrobble left a comment

Uh oh!

hhuangMITRE left a comment

Uh oh!

jrobble commented May 3, 2024 •

edited

Loading

Uh oh!

jrobble commented May 4, 2024

Uh oh!

hhuangMITRE left a comment

Uh oh!

jrobble left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hhuangMITRE commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrobble left a comment

Choose a reason for hiding this comment

Uh oh!

hhuangMITRE left a comment

Choose a reason for hiding this comment

Uh oh!

jrobble commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrobble commented May 4, 2024

Uh oh!

hhuangMITRE left a comment

Choose a reason for hiding this comment

Uh oh!

jrobble left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hhuangMITRE commented Apr 30, 2024 •

edited

Loading

jrobble commented May 3, 2024 •

edited

Loading