Conversation
jnelson16
left a comment
There was a problem hiding this comment.
Test passes on my laptop, so this is probably good to go. I still want to try running one of the state corpora through the new driver, I'll let you know how that goes.
quantgov/corpora/structures.py
Outdated
| """ Filter paths based on index values. """ | ||
| raise NotImplementedError | ||
|
|
||
| def gen_indces_and_paths(self): |
| RecursiveDirectoryCorpusDriver, | ||
| NamePatternCorpusDriver, | ||
| IndexDriver | ||
| IndexDriver, |
There was a problem hiding this comment.
@OliverSherouse Is there a reason this is not called IndexCorpusDriver, to follow the pattern above?
There was a problem hiding this comment.
Nope. Let's add that as a bug and rename for 1.0
tests/test_corpora.py
Outdated
| rows.append((letter, number, path)) | ||
| index_path = directory.join('index.csv') | ||
| with index_path.open('w', encoding='utf-8') as outf: | ||
| outf.write(u'letter,number,path\n') |
There was a problem hiding this comment.
Should we be using the csv.writerows method for this? Or is this a more efficient way for testing purposes? @OliverSherouse
There was a problem hiding this comment.
Not more efficient, particularly, though not really a problem, either.
|
I have an S3Driver working with the Wyoming (or as @OliverSherouse calls it, Wisconsin) corpus! |
OliverSherouse
left a comment
There was a problem hiding this comment.
Fix small bugs and a few questions.
tests/test_corpora.py
Outdated
| rows.append((letter, number, path)) | ||
| index_path = directory.join('index.csv') | ||
| with index_path.open('w', encoding='utf-8') as outf: | ||
| outf.write(u'letter,number,path\n') |
There was a problem hiding this comment.
Why do we have u strings? This ain't 2007, we're not writing python 2!
| outf.write(u'letter,number,path\n') | ||
| outf.write(u'\n'.join(','.join(row) for row in rows)) | ||
| return quantgov.corpora.S3Driver(str(index_path), | ||
| bucket='quantgov-databanks') |
There was a problem hiding this comment.
Will people outside the core dev team be able to run these tests?
There was a problem hiding this comment.
As long as they have aws credentials for boto I believe so, since the bucket is public.
* Inaugurated 0.4.0 dev series * Sentiment analysis (#33) Closes #11 #12 #13 and adds Sentiment analysis! * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * textblob sentiment * tests and error raising * fixed install req * pep8 fixes * code review updates * fix travis file * import fixes * small fix * Test corpora (#35) * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * new corpora in English!! * hotfix to add timestamp as corpus identifier * Skl compatibility (#41) * Add sklearn 0.17 compatibility Paper over library reorganization. * renamed corpora to corpus, added deprecation warning (#42) * renamed corpora to corpus, added deprecation warning * moved load_driver and set up for future forcing of full imports of submodules Closes #31 * S3 drivers (#44) * initial working commit for s3 driver and database driver * removing 3.6 formatting * adding extra requirements list * adding basic s3 driver test * Removing unnecessary function * This ain't 2007 * test updates * adding s3driver to new corpus structure * Rounding (#45) * bumped version
* Inaugurated 0.4.0 dev series * Sentiment analysis (#33) Closes #11 #12 #13 and adds Sentiment analysis! * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * textblob sentiment * tests and error raising * fixed install req * pep8 fixes * code review updates * fix travis file * import fixes * small fix * Test corpora (#35) * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * new corpora in English!! * hotfix to add timestamp as corpus identifier * Skl compatibility (#41) * Add sklearn 0.17 compatibility Paper over library reorganization. * renamed corpora to corpus, added deprecation warning (#42) * renamed corpora to corpus, added deprecation warning * moved load_driver and set up for future forcing of full imports of submodules Closes #31 * S3 drivers (#44) * initial working commit for s3 driver and database driver * removing 3.6 formatting * adding extra requirements list * adding basic s3 driver test * Removing unnecessary function * This ain't 2007 * test updates * adding s3driver to new corpus structure * Rounding (#45) * bumped version * Fix NLTK loading bug Fix evaluation order when NLTK is not present
* hotfix to add timestamp as corpus identifier (#39) * bumped version * Release 0.4 (#47) * Inaugurated 0.4.0 dev series * Sentiment analysis (#33) Closes #11 #12 #13 and adds Sentiment analysis! * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * textblob sentiment * tests and error raising * fixed install req * pep8 fixes * code review updates * fix travis file * import fixes * small fix * Test corpora (#35) * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * new corpora in English!! * hotfix to add timestamp as corpus identifier * Skl compatibility (#41) * Add sklearn 0.17 compatibility Paper over library reorganization. * renamed corpora to corpus, added deprecation warning (#42) * renamed corpora to corpus, added deprecation warning * moved load_driver and set up for future forcing of full imports of submodules Closes #31 * S3 drivers (#44) * initial working commit for s3 driver and database driver * removing 3.6 formatting * adding extra requirements list * adding basic s3 driver test * Removing unnecessary function * This ain't 2007 * test updates * adding s3driver to new corpus structure * Rounding (#45) * bumped version * Fix NLTK loading bug Fix evaluation order when NLTK is not present
No description provided.