-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or request
Description
From Bill:
(word1, frequncy), (word2, frequency), ...
then trying to measure how far that distribution is from uniform
one simple nice way is entropy
P(word1)*log(P(word1)) + P(word2)*log(P(word2)) + ...
where P(word1) is just frequency of word1 / total wordsIt's nice because it measures how "unpredictable" the signal is. If most words are zero, and only a few words are common, then it's predictable. Or, if all words are exactly the same, then it's predictable. But if it's crazy town, then it's not predictable.
This seems like a decent resource: http://normal-extensions.com/2013/08/04/entropy-for-n-grams/
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request