Skip to content

use of custom stop words #28

@gboyega1

Description

@gboyega1

Using it with the KeyBert library and utilizing a list of custom stop words doesn't appear to have any impact.

no custom stop word list

vectorizer = KeyphraseCountVectorizer()
kw_model.extract_keywords(strip_html(course[2]), vectorizer=vectorizer, top_n = 15, use_mmr = True, diversity = 0.45)

output

[('hrm msc students', 0.5963), ('human resources', 0.5466), ('organisational development', 0.505), ('student experience', 0.4352), ('business school', 0.4336), ('people profession', 0.4273), ('pwc staff', 0.416), ('london offices', 0.4049), ('research leaders', 0.3931), ('professional stream skills workshop satisfy requirements', 0.3907), ('quality education', 0.3792), ('cipd accreditation', 0.3522), ('dissertation', 0.3428), ('relevant programmes', 0.3143), ('edge practice', 0.2627)]

including a custom stop word list to discard 'msc'

vectorizer = KeyphraseCountVectorizer(stop_words = stpwrds)
kw_model.extract_keywords(strip_html(course[2]), vectorizer=vectorizer, top_n = 15, use_mmr = True, diversity = 0.45)

output produces same keyphrases with identical importance

[('hrm msc students', 0.5963), ('human resources', 0.5466), ('organisational development', 0.505), ('student experience', 0.4352), ('business school', 0.4336), ('people profession', 0.4273), ('pwc staff', 0.416), ('london offices', 0.4049), ('research leaders', 0.3931), ('professional stream skills workshop satisfy requirements', 0.3907), ('quality education', 0.3792), ('cipd accreditation', 0.3522), ('dissertation', 0.3428), ('relevant programmes', 0.3143), ('edge practice', 0.2627)]

Also note the inclusion of 'hrm msc students' despite having included msc as a stop word

Any help that can be provided about this would be greatly helpful

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions