I am trying to use KeyphraseCountVectorizer using the example provided here https://github.com/TimSchopf/KeyphraseVectorizers#topic-modeling-with-bertopic-and-keyphrasevectorizers
from keyphrase_vectorizers import KeyphraseCountVectorizer
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
# load text documents
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# only use subset of the data
docs = docs[:5000]
# train topic model with KeyphraseCountVectorizer
keyphrase_topic_model = BERTopic(vectorizer_model=KeyphraseCountVectorizer())
keyphrase_topics, keyphrase_probs = keyphrase_topic_model.fit_transform(docs)
This produces following error.
RuntimeWarning: divide by zero encountered in true_divide
idf = np.log((avg_nr_samples / df)+1)
I have run it on various datasets of various sizes and the error is consistent.
I have posted the same question in BERTopic here MaartenGr/BERTopic#1050
Any idea about this would be very helpful.
I am trying to use
KeyphraseCountVectorizerusing the example provided here https://github.com/TimSchopf/KeyphraseVectorizers#topic-modeling-with-bertopic-and-keyphrasevectorizersThis produces following error.
I have run it on various datasets of various sizes and the error is consistent.
I have posted the same question in BERTopic here MaartenGr/BERTopic#1050
Any idea about this would be very helpful.