Data files for CIS 5300: Natural Language Processing homework assignments.
These files are distributed as GitHub release assets. They are automatically downloaded by the homework notebooks in Section 0 (Setup).
| Release | Homework | Size | Contents |
|---|---|---|---|
| su26 | HW01 | 43 MB | Complex word data, n-gram counts, domain generalization datasets |
| su26 | HW03 | 2.1 MB | Shakespeare text, city name classification data |
| su26 | HW04 | 906 KB | Shakespeare plays, word clustering data, SimLex-999 |