GitHub - prasiyer/NLP_Project

Installation

In order to run the notebooks in this repository, the following libraries have to be installed:

Pandas 0.24.2
Numpy 1.17.4
pickle
sqlalchemy
seaborn 0.9.0
scikit-learn 0.21.2
nltk

Project Motivation

The purpose of this project is to create a classifier for classifying input messages. The input contains messages received in disaster zones. The classifier is used to classify these input messages into categories. In turn, the predicted category can be used to route the message to the appropriate agency. The intended benefit is prompt response to the incoming messages

Technical Details

This project demonstrates:

Use of Pipeline to execute ML workflow
The workflow steps consist of a) reading data and cleaning data b) training a classifier c) evaluating the trained classifier

File Descriptions

The repository consists of 2 main folders -- Data & Code The Data folder has:

2 CSV files: disaster_messages.csv and disaster_categories.csv are the input data files. These files contain the messages received from disaster regions and the corresponding categories of the messages, respectively
Database file: DisasterResponse.db is a SQLite database. This database has a main table (Message_Category). This table contains the clean data [X: Tokenized message, y: Categories] used to train and evaluate the classifier

The Code folder has:

Data_ETL.ipynb & process_data.py: These are the Jupyter notebook and the corresponding python script for reading, cleaning and loading of the input data into a database
ML_NLP_Workflow.ipynb and train_classifier.py: These are the Jupyter notebook and corresponding python script for training the classifier. This script utilizes GridSearch among RandomForest and KNeighors classifiers.
Model_Evaluation.ipnyb: This notebook analyzes the performance of the classifier. The output categories are separated into 2 sets [prominent and other] based on the frequency of their occurence in the dataset
run.py: This script loads the trained model and presents the model as webapp. The location of the trained model is the used in the script

Instructions:

process_data.py: This script accepts 3 input parameters -
a) messages_filepath (str): Location of the csv file containing the disaster messages
b) categories_filepath (str): Location of the csv file containing the categories for the disaster messages
c) database_filepath (str): String containing the location and name of the database. The pandas Dataframe with the transformed data will be saved as a table in this database
d) This script as run on the terminal - cd NLP_Project ## go to the location of the repository
python ./Code/process_data.py ./Data/disaster_messages.csv ./Data/disaster_categories.csv ./Data/DisasterResponse1.db
train_classifier.py: This script accepts 2 input parameters -
a) database_filepath (str): String containing the location and name of the database. This database has the input data for training (as a table)
b) model_filepath (str): String containing the location where the trained model should be stored (as a pickle file)
c) This script as run on the terminal -
python ./Code/train_classifier.py ./Data/DisasterResponse1.db ./Code/cv_model1.sav
run.py: This script does not have any input parameters. Before running the location of the saved model has to be validated in the Run.py python script
python ./Code/run.py

Acknowledgements

Thanks to Python open source community for creating valuable libraries used in this project.
This project uses normalized dataset of truckload shipments

License

Apache license

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Code		Code
Data		Data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Project Motivation

Technical Details

File Descriptions

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Project Motivation

Technical Details

File Descriptions

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages