Skip to content

add spacy.Language as valid argument for 'spacy_pipeline'#19

Merged
TimSchopf merged 2 commits intoTimSchopf:masterfrom
dominik-schwabe:master
Dec 23, 2022
Merged

add spacy.Language as valid argument for 'spacy_pipeline'#19
TimSchopf merged 2 commits intoTimSchopf:masterfrom
dominik-schwabe:master

Conversation

@dominik-schwabe
Copy link
Copy Markdown
Contributor

This commit allows to reuse an object from spacy.load for many different KeyphraseVectorizer objects. I noticed that the nlp objects gets loaded when fit is called, which makes extracting keyphrases from multiple documents super slow when a model link en_core_web_md is used.

@TimSchopf TimSchopf added the enhancement New feature or request label Dec 17, 2022
@TimSchopf
Copy link
Copy Markdown
Owner

TimSchopf commented Dec 17, 2022

Hi Dominik,

thanks for the contribution. Can you also please add a short code example and explanation on how to use the new argument in the README.md file?

Also, you can extract keyphrases from multiple documents with the same object and calling fit only once by using a list of documents as inputs. This probably solves the issue already.

Best,
Tim

@dominik-schwabe
Copy link
Copy Markdown
Contributor Author

I added a little example to the README.

Also, you can extract keyphrases from multiple documents with the same object and calling fit only once by using a list of documents as inputs. This probably solves the issue already.

Usually I make small experiments, where I inspect the results on one documents then change some things, try the changed document or try some different document. I also usually use en_core_web_md. In that setup loading a new nlp object for every new small experiment gives a delay of about 5s instead of being instantaneous.

@TimSchopf TimSchopf merged commit f5bee69 into TimSchopf:master Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants