Add nlp_text_splitter package#78
Conversation
jrobble
left a comment
There was a problem hiding this comment.
Reviewed 7 of 9 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @hhuangMITRE)
detection/nlp_text_splitter/install.sh line 1 at r2 (raw file):
#!/usr/bin/env bash
Needs header comments.
detection/nlp_text_splitter/install.sh line 7 at r2 (raw file):
main() { if ! options=$(getopt --name "$0" \ --options t:gm:w:b: \
Should this be --options t:gm:w:s: ?
Specifically, the b: should be replaced with s: since that's the short version of --install-spacy-model | -s.
detection/nlp_text_splitter/install.sh line 13 at r2 (raw file):
fi eval set -- "$options" local models_dir=/opt/wtp/models
Call this wtp_models_dir for clarity.
detection/nlp_text_splitter/install.sh line 130 at r2 (raw file):
same directory as this script) --gpu, -g: Install the GPU version of PyTorch --models-dir, -m <path>: Path where WTP models will be stored.
Call this --wtp-models-dir for clarity. Update in the README as well.
hhuangMITRE
left a comment
There was a problem hiding this comment.
Reviewable status: 5 of 9 files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @jrobble)
detection/nlp_text_splitter/install.sh line 1 at r2 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Needs header comments.
Fixed, thank you! I've updated the headers in this file to 2024. I can update the other headers from 2023->2024 as well.
detection/nlp_text_splitter/install.sh line 7 at r2 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Should this be
--options t:gm:w:s:?Specifically, the
b:should be replaced withs:since that's the short version of--install-spacy-model | -s.
Yes, it should be s: instead of b:. I fixed it, thanks!
detection/nlp_text_splitter/install.sh line 13 at r2 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Call this
wtp_models_dirfor clarity.
Done, I've also updated the function call setup_wtp_models_dir.
detection/nlp_text_splitter/install.sh line 130 at r2 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Call this
--wtp-models-dirfor clarity. Update in the README as well.
Done.
detection/nlp_text_splitter/README.md line 40 at r4 (raw file):
setup a PyTorch installation with CUDA (GPU) libraries. - `--wtp-models-dir |-m <wtp-models-dir >`: Add this parameter to
Refactored to wtp-models-dir
|
This space before the colon made it so that I could not do a GPU build no matter what I provided for I fixed this in a commit. |
|
Not only do you need to check if the model name is the same, you need to also check if the model setting is the same (CPU vs. GPU). I ran into a situation where on first run with |
hhuangMITRE
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 9 files reviewed, 5 unresolved discussions (waiting on @brosenberg42 and @jrobble)
detection/nlp_text_splitter/nlp_text_splitter/__init__.py line 105 at r6 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Not only do you need to check if the model name is the same, you need to also check if the model setting is the same (CPU vs. GPU).
I ran into a situation where on first run with
SENTENCE_MODEL_CPU_ONLY=falsethe component wasUsing cached model: wtp-bert-minibut I saw no GPU activity.
Thanks for catching that, I tested the canine series and didn't catch this. I've made the requested changes and rebuilt the images just to make sure. I've tested out the text-splitter with gpu servers: confirmed that the caching works when updating a cpu wtp-bert-mini to use cuda/gpu.
Confirmed that works, bert-mini used about 800 MB of GPU memory.
jrobble
left a comment
There was a problem hiding this comment.
Reviewed 3 of 4 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)
Issues:
Related PRs:
This change is