Code for creating topic modeling curation files, for annotators to work with topic modeling output in order to label and describe the topics (thus creating human-usable codes, useful for content analysis).
To run the scripts, first set up a conda environment and install the required libraries, run: conda env create --file environment.yml, and then activate the conda environment using conda activate topic_curation.
The main base script is create_topic_curation_files.py, and generated output files that would be used for curation are in example_data/outputs/.
To add custom columns for getting annotations and ratings other than just getting topic labelings, the code in create_topic_curation_files_with_custom_ratings_columns.py can be modified and repurposed according to the use case (example output is in example_data/outputs_with_custom_col_in_topic_word_file).
NOTE: To run the scripts with the example data in example_data/, a raw_documents.txt (with one document text per line) must be present (not there because the one that goes with rest of the files is too big).
TO-DO:
- Further documentation.
- More command line options and flexibility.
- Custom ratings columns to be provided as an input to the script, likely by having the names and values for those columns in a file.