A possible way to deal with elisions by odelmarcelle · Pull Request #26 · SentometricsResearch/sentometrics

odelmarcelle · 2021-12-03T13:06:27Z

The current tokenization leaves French elisions attached to their words. This causes some sentiment words to not be identified when computing sentiment. For example, "l'abandon" is not identified as negative whereas "abandon" is a negative word in the French LoughranMcDonald lexicon.

This pull request adds an argument to compute_sentiment, defaulting to TRUE, that simply removes a number of elision patterns at the beginning of each word. I'm not certain how this can affect other languages, but I don't see how to make a language-specific filter with the current implementation.

See the test file for an example.

sborms · 2022-01-01T14:17:56Z

Nice addition, well documented & good unit test! Some feedback:

Prefer to change remove_elisions to do.removeElisions (consistent with naming of logicals, cf. do.ignoreZeros).
Because you're not sure about the impact on other languages, and to not break existing examples or scripts, it might be smarter to let the new argument default to FALSE? Your choice.
You'll also have to add the new argument to the ctr_agg() function.
You can also add yourself as a contributor in the DESCRIPTION file, and change the version to 1.1.0.

Once the changes are pushed, we can merge.

odelmarcelle added 3 commits December 3, 2021 14:01

remove_elisions argument

af6d016

fix utf8 documentation

56b777f

fix utf8 regex

8479514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A possible way to deal with elisions#26

A possible way to deal with elisions#26
odelmarcelle wants to merge 3 commits intoSentometricsResearch:masterfrom
odelmarcelle:remove_elisions

odelmarcelle commented Dec 3, 2021 •

edited

Loading

Uh oh!

sborms commented Jan 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

odelmarcelle commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sborms commented Jan 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

odelmarcelle commented Dec 3, 2021 •

edited

Loading