Divide by zero error when trying to use `KeyphraseCountVectorizer` with BERTopic

I am trying to use `KeyphraseCountVectorizer` using the example provided here https://github.com/TimSchopf/KeyphraseVectorizers#topic-modeling-with-bertopic-and-keyphrasevectorizers

```
from keyphrase_vectorizers import KeyphraseCountVectorizer
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

# load text documents
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
# only use subset of the data 
docs = docs[:5000]

# train topic model with KeyphraseCountVectorizer
keyphrase_topic_model = BERTopic(vectorizer_model=KeyphraseCountVectorizer())
keyphrase_topics, keyphrase_probs = keyphrase_topic_model.fit_transform(docs)

```

This produces following error.

```
RuntimeWarning: divide by zero encountered in true_divide
  idf = np.log((avg_nr_samples / df)+1)
```

I have run it on various datasets of various sizes and the error is consistent.

I have posted the same question in BERTopic here https://github.com/MaartenGr/BERTopic/issues/1050

Any idea about this would be very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Divide by zero error when trying to use `KeyphraseCountVectorizer` with BERTopic #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Divide by zero error when trying to use KeyphraseCountVectorizer with BERTopic #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Divide by zero error when trying to use `KeyphraseCountVectorizer` with BERTopic #26