Conversation
OliverSherouse
left a comment
There was a problem hiding this comment.
Mostly good stuff, just need a few edits.
quantgov/corpora/builtins.py
Outdated
| from textblob import Word | ||
| from textblob import TextBlob | ||
|
|
||
| wn.ensure_loaded() |
There was a problem hiding this comment.
Told you it was something simple!
quantgov/corpora/builtins.py
Outdated
|
|
||
| import quantgov | ||
|
|
||
| from nltk.corpus import wordnet as wn |
There was a problem hiding this comment.
Let's make these absolute imports.
quantgov/corpora/builtins.py
Outdated
| return ('shannon_entropy',) | ||
|
|
||
| @staticmethod | ||
| def process_document(doc, word_pattern, stopwords): |
There was a problem hiding this comment.
Let's give an option for precision, not just round at 2 without asking
| 'testing': ['pytest-flake8'], | ||
| 'complexity': [ | ||
| 'textblob', | ||
| 'nltk' |
There was a problem hiding this comment.
Isn't nltk a dependency of textblob?
quantgov/corpora/builtins.py
Outdated
| @staticmethod | ||
| def process_document(doc): | ||
| sentences = TextBlob(doc.text).sentences | ||
| total_length = 0 |
There was a problem hiding this comment.
We can condense this, can't we?
sum(len(sentence.words) for sentence.sentences / len(sentences)
quantgov/corpora/builtins.py
Outdated
| arguments=[ | ||
| quantgov.utils.CLIArg( | ||
| flags=('--pattern'), | ||
| kwargs={ |
There was a problem hiding this comment.
I don't think we really need this option. In fact, we could even take this as a special case of "count occurences", and just wrap that function, wouldn't we?
| ['quantgov', 'corpus', 'shannon_entropy', str(PSEUDO_CORPUS_PATH)], | ||
| ) | ||
| assert output == 'file,shannon_entropy\n1,7.14\n2,8.13\n' | ||
|
|
There was a problem hiding this comment.
Can we get tests for the options as well?
|
@OliverSherouse this should be ready for a second look |
in setup.py
nltk fixes
nltk troubles
quantgov/corpora/builtins.py
Outdated
|
|
||
|
|
||
| class ShannonEntropy(): | ||
| LEMMAS = {} |
There was a problem hiding this comment.
Lemmas should be lowercased because it's not module-level
| extras_require={ | ||
| 'testing': ['pytest-flake8'] | ||
| 'testing': ['pytest-flake8'], | ||
| 'builtins': [ |
There was a problem hiding this comment.
Either we should move these requirements to install_requires or we should make it so that qg can still run without them installed (and we throw an error if they aren't installed and someone tries to use that builtin)
|
Closing in favor of #33 |
Covers #11, #12, #13