Skip to content

A possible way to deal with elisions#26

Open
odelmarcelle wants to merge 3 commits intoSentometricsResearch:masterfrom
odelmarcelle:remove_elisions
Open

A possible way to deal with elisions#26
odelmarcelle wants to merge 3 commits intoSentometricsResearch:masterfrom
odelmarcelle:remove_elisions

Conversation

@odelmarcelle
Copy link
Copy Markdown
Contributor

@odelmarcelle odelmarcelle commented Dec 3, 2021

The current tokenization leaves French elisions attached to their words. This causes some sentiment words to not be identified when computing sentiment. For example, "l'abandon" is not identified as negative whereas "abandon" is a negative word in the French LoughranMcDonald lexicon.

This pull request adds an argument to compute_sentiment, defaulting to TRUE, that simply removes a number of elision patterns at the beginning of each word. I'm not certain how this can affect other languages, but I don't see how to make a language-specific filter with the current implementation.

See the test file for an example.

@sborms
Copy link
Copy Markdown
Collaborator

sborms commented Jan 1, 2022

Nice addition, well documented & good unit test! Some feedback:

  • Prefer to change remove_elisions to do.removeElisions (consistent with naming of logicals, cf. do.ignoreZeros).
  • Because you're not sure about the impact on other languages, and to not break existing examples or scripts, it might be smarter to let the new argument default to FALSE? Your choice.
  • You'll also have to add the new argument to the ctr_agg() function.
  • You can also add yourself as a contributor in the DESCRIPTION file, and change the version to 1.1.0.

Once the changes are pushed, we can merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants