Skip to content

complexity builtins#32

Closed
jnelson16 wants to merge 18 commits intoQuantGov:devfrom
jnelson16:dev
Closed

complexity builtins#32
jnelson16 wants to merge 18 commits intoQuantGov:devfrom
jnelson16:dev

Conversation

@jnelson16
Copy link
Member

Covers #11, #12, #13

Copy link
Contributor

@OliverSherouse OliverSherouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good stuff, just need a few edits.

from textblob import Word
from textblob import TextBlob

wn.ensure_loaded()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Told you it was something simple!


import quantgov

from nltk.corpus import wordnet as wn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make these absolute imports.

return ('shannon_entropy',)

@staticmethod
def process_document(doc, word_pattern, stopwords):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give an option for precision, not just round at 2 without asking

'testing': ['pytest-flake8'],
'complexity': [
'textblob',
'nltk'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't nltk a dependency of textblob?

@staticmethod
def process_document(doc):
sentences = TextBlob(doc.text).sentences
total_length = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can condense this, can't we?

sum(len(sentence.words) for sentence.sentences / len(sentences)

arguments=[
quantgov.utils.CLIArg(
flags=('--pattern'),
kwargs={
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need this option. In fact, we could even take this as a special case of "count occurences", and just wrap that function, wouldn't we?

['quantgov', 'corpus', 'shannon_entropy', str(PSEUDO_CORPUS_PATH)],
)
assert output == 'file,shannon_entropy\n1,7.14\n2,8.13\n'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get tests for the options as well?

@jnelson16
Copy link
Member Author

@OliverSherouse this should be ready for a second look



class ShannonEntropy():
LEMMAS = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemmas should be lowercased because it's not module-level

extras_require={
'testing': ['pytest-flake8']
'testing': ['pytest-flake8'],
'builtins': [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we should move these requirements to install_requires or we should make it so that qg can still run without them installed (and we throw an error if they aren't installed and someone tries to use that builtin)

@OliverSherouse
Copy link
Contributor

Closing in favor of #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants