We apply a model that uses BERT as a backbone to two similar problems:
- Google QUEST Q&A Labeling: assign 30 scores (from 0 to 1) to a question-answer pair.
- Tweet sentiment analysis: assign a sentiment to a tweet (positive, negative or neutral). This task differs from the original task of the competition.
- Download the data from here, modify the
variables
DATA_DIR,RESULTS_DIRfrom .env and load it:create a python virtual environment and install the dependencies:source .envconda create -n nlu python=3.6 -y conda activate nlu pip install -r requirements.txt python setup.py install
Tweet sentiment analysis
- Train the model (remove
size_tr_valto use the complete dataset;size_valrefers to the size of the validation dataset):python exec/train_tweet_sentiment.py \ --data_path "${DATA_DIR}/train.csv" \ --model_dir "${RESULTS_DIR}/models" \ --log_dir "${RESULTS_DIR}/logs" \ --size_val 2700 \ --batch_size 50 \ --num_epochs 10 \ --print_freq 200 \ --seed 10 - Results from the training can be visualised with tensorboard:
or within a Jupyter notebook
tensorboard --logdir=${RESULTS_DIR}/logsThe logs from this training session are available in the%reload_ext tensorboard %tensorboard --logdir <logs directory>logsdirectory (tensorboard --logdir=logs).

Google QUEST Q&A Labeling (WIP)
TODO: The metric logger has to be improved in order to know how good the model performs. At the moment we just record the binary cross entropy for every one of the 30 scores that have to be assigned to a question-answer pair.
-
Train the model (remove
size_tr_valto use the complete dataset;size_valrefers to the size of the validation dataset):python exec/train_google_qa.py \ --data_path ${DATA_DIR}/train.csv \ --model_dir ${RESULTS_DIR}/models \ --log_dir ${RESULTS_DIR}/logs \ --size_tr_val 100\ --size_val 40\ --batch_size 6 \ --num_epochs 2 \ --print_freq 10 \ --seed 10 -
Make a prediction (only for the first 100 elements from the test set):
python exec/predict_google_qa.py \ --data_path ${DATA_DIR}/test.csv \ --result_dir ${RESULTS_DIR}/results \ --model_dir ${RESULTS_DIR}/models \ --load_epoch 1 \ --batch_size 2 \ --n_el 100