GitHub

Download Links

Please download both the datasets before cloning this repository

Processed Dataset - http://sorena.multicomp.cs.cmu.edu/downloads/MOSEI/
Raw Dataset - http://sorena.multicomp.cs.cmu.edu/downloads_raw/MOSEI

Specify Environment variable : export LC_ALL=C.UTF-8

Random seed initialization https://discuss.pytorch.org/t/random-seed-initialization/7854 torch.manual_seed(777) torch.cuda.manual_seed(777) np.random.seed(777)

Baselines and Metrics:

  The following metrics are defined :
  Metric 1 = MSE with sum across categories [0.73]
  Metric 2 = MAE with sum across categories [0.8686]
  Metric 3 = Huber Loss (Smooth L1 Loss)    [0.3263]
  Metric 4 = Binary classification accuracy at threshold=0.5 [0.xxxx]
  Metric 5 = Weighted accuracy at threshold=0.1 [0.xxxx]

Attention with sum_{t}{av} instead of mean(av) and m+a+v+t instead of m+(avt)

Model	Modality	Metric 1 Val	Metric 1 Test	Metric 1 Train	Metric 4 Val	Matric 4 Test	Metric 5 Val	Metric 5 Test	Metric 2 Val	Metric 2 Test	Metric 3 Val	Metric 3 Test
Random	-	0.63	0.647						0.7938	0.8121	0.2822	0.2946

Attention with sum_{t}{av} instead of mean(av)

Model	Modality	Metric 1 Val	Metric 1 Test	Metric 1 Train	Metric 2 Val	Metric 2 Test	Metric 3 Val	Metric 3 Test
Random	-	0.63	0.647		0.7938	0.8121	0.2822	0.2946
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth(seed777) -- testclean	--	0.4696	0.4090
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth(seed777) -- testaudionoise	--	0.5071	0.4090
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth(seed777) -- testvisionnoise	--	0.5034	0.4090
Triple Attention-scalar-1024-mixvisionnoise	V+A+T(scalarAttTime)- __4.pth(seed777) -- testclean	--	0.4932	0.4131
Triple Attention-scalar-1024-mixvisionnoise	V+A+T(scalarAttTime)- __4.pth(seed777) -- testaudionoise	--	0.5173	0.4131
Triple Attention-scalar-1024-mixvisionnoise	V+A+T(scalarAttTime)- __4.pth(seed777) -- testvisionnoise	--	0.4934	0.4131

Old attention (downweighting memory update by mean(0).unsqueeze(0) instead of sum(0) )

Model	Modality	Metric 1 Val	Metric 1 Test	Metric 1 Train	Metric 4 Val	Matric 4 Test	Metric 5 Val	Metric 5 Test	Metric 2 Val	Metric 2 Test	Metric 3 Val	Metric 3 Test
Random	-	0.63	0.647						0.7938	0.8121	0.2822	0.2946
Triple Attention	V+A+T	0.4765	0.4709
Triple Attention-scalar	V+A+T(scalarAttTime) __5.pth	0.5193	0.5346	0.5986
Triple Attention-scalar	V+A+T(scalarAttTime) __6.pth	0.5439	0.5520	0.5742
Triple Attention-scalar-1024	pretrained V+A+T (scalarAttTime)- __1.pth	0.5159	0.5072	0.4498
Triple Attention-scalar-1024	pretrained V+A+T (scalarAttTime)- __2.pth	0.5018	0.4866	0.4103
Triple Attention-scalar-1024	pretrained V+A+T (scalarAttTime)- __3.pth	0.5176	0.5043	0.3703
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __3.pth	0.4816	0.4790	0.4605
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth	0.4884	0.4806	0.4345
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth(seed777) -- testclean	--	0.4772	0.4239	0.9072	0.9115	0.5900	0.6115
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth(seed777) -- testaudionoise	--	0.5221	0.4239
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __4.pth(seed777) -- testvisionnoise	--	0.5026	0.4239
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __5.pth	0.4789	0.4745	0.4087
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __5.pth(seed777) -- testclean	--	0.4806	0.3986
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __5.pth(seed777) -- testnoise	--		0.3986
Triple Attention-scalar-1024	V+A+T(scalarAttTime)- __6.pth	0.4870	0.4929	0.3830
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __2.pth -- testclean		0.4823	0.5005
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __3.pth -- testclean		0.4843	0.4703
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __4.pth -- testclean		*0.4780	0.4439
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __5.pth -- testclean		0.4831	0.4180
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __2.pth -- testnoise		0.5135	0.5005
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __3.pth -- testnoise		0.4781	0.4703
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __4.pth -- testnoise		*0.4843	0.4439
Triple Attention-scalar-1024-audioablation	V+A+T(scalarAttTime)- __5.pth -- testnoise		0.4845	0.4180
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __2.pth -- testclean		0.4968	0.4950
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __3.pth -- testclean		0.5065	0.4701
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __4.pth -- testclean		*0.4811	0.4473
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __5.pth -- testclean		0.5031	0.4242
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __2.pth -- testnoise		0.4987	0.4950
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __3.pth -- testnoise		0.5032	0.4701
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __4.pth -- testnoise		*0.4850	0.4473
Triple Attention-scalar-1024-visionablation	V+A+T(scalarAttTime)- __5.pth -- testnoise		0.5072	0.4242
Triple Attention-1024	V+A+T(attElement)- __3.pth	0.4910	0.4859	0.4604
Triple Attention-1024	V+A+T(attElement)- __4.pth	0.5003	0.4793	0.4303
Triple Attention-1024	V+A+T(attTime)- __3.pth	0.4855	0.4919	0.4671
Triple Attention-1024	V+A+T(attTime)- __4.pth	0.4888	0.4889	0.4409
Triple Attention-1024	V+A+T(attTime)- __5.pth	0.4761	0.4816	0.4144
Triple Attention-1024	V+A+T(attTime)- __6.pth	0.4970	0.4973	0.3879
Triple Attention-1024	V+A+T(attTime)- __7.pth	0.5103	0.5121	0.3608
Triple Attention-1024-gated	V+A+T(attTime)- __4.pth	0.4746	0.4631	0.4512
Triple Attention-1024-gated	V+A+T(attTime)- __5.pth	0.4911	0.4734	0.4225
Triple Attention-scalar-1024-gated	V+A+T(scalarAttTime)- __4.pth	0.4812	0.4683	0.4470
Triple Attention-scalar-1024-gated	V+A+T(scalarAttTime)- __5.pth	0.4838	0.4649	0.4262
Triple Attention-scalar-1024-gated-k3	V+A+T(scalarAttTime)- __5.pth	0.4812	0.4912	0.4171
Triple Attention-scalar-1024-gated-k3	V+A+T(scalarAttTime)- __4.pth	0.4730	0.4709	0.3993
Triple Attention-scalar-1024-gated-k3	V+A+T(scalarAttTime)- __3.pth	0.4715	0.4602	0.4670	0.9110	0.9142	0.5775	0.5924		0.9066
Triple Attention-scalar-1024-gated-k1	V+A+T(scalarAttTime)- __5.pth	0.5151	0.5034	0.4579
Triple Attention-scalar-1024-gated-k1	V+A+T(scalarAttTime)- __4.pth	0.5088	0.4957	0.4763
Triple Attention-scalar-1024-gated-k1	V+A+T(scalarAttTime)- __3.pth	0.5105	0.5042	0.4953
Triple Attention-scalar-1024-pretrained-gated-k3	V+A+T(scalarAttTime)- __4.pth		0.4789	0.4219
Triple Attention-scalar-1024-pretrained-gated-k3	V+A+T(scalarAttTime)- __5.pth		0.4992	0.4524
Early Concatenation	V+A+T
Late Weighting	V+A+T __0.pth		0.5175	0.5647
Late Weighting	V+A+T __1.pth		0.5047	0.4302
Late Weighting	V+A+T __2.pth		0.5098	0.3781
Late Weighting	V+A+T __3.pth		0.5413	0.3434
Dual Attention	V+T
Dual Attention	V+A	0.5157	0.5103
Dual Attention	A+T
Early Concatenation	V+T
Early Concatenation	V+A
Early Concatenation	A+T
Late Weighting	V+T
Late Weighting	V+A
Late Weighting	A+T
LSTM + Attention	V
LSTM + Attention	T	0.6285
LSTM + Attention	A
LSTM	V	0.5170	0.5106
LSTM	T	0.6026	0.6056
LSTM	A

Scoring

To add scoring capabilities in your python script

import the scoring function from cmu_score import ComputePerformance
initialise a numpy array to store all reference and hypotheses while epoch<no_of_epochs: overall_hyp=np.zeros((0,no_of_emotions)) overall_ref=np.zeros((0,no_of_emotions))
assign the outputs and ground truth value overall_hyp = np.concatenate((overall_hyp,outputs.data.cpu().numpy()),axis=0) overall_ref = np.concatenate((overall_ref,gt.data.cpu().numpy()),axis=0)
At the end of the epoch, score it score=ComputePerformance(overall_ref,overall_hyp); print('Scoring -- Epoch [%d], Sample [%d], Binary accuracy %.4f' % (epoch+1, K, score['binaryaccuracy'])) print('Scoring -- Epoch [%d], Sample [%d], MSE %.4f' % (epoch+1, K, score['MSE'])) print('Scoring -- Epoch [%d], Sample [%d], MAE %.4f' % (epoch+1, K, score['MAE']))

Structure of Pickle Files

1. Emotions.pkl

Let emo_intsts be = array([Anger_Intensity, Disgust_Intensity, Fear_Intensity, Happy_Intensity ,Sad_Intensity, Surprise_Intensity]
{"Video Name": {"Segment ID i_1 ": emo_intsts,"Segment ID i_2 ": emo_intsts, .... ,"Segment ID i_n ": emo_intsts}}

There are 23453 segments in total in mosei (train+val+test)

k=0 for i in mosei_emotions.keys(): ... for j in mosei_emotions[i].keys(): ... k = k + 1

These emotion labels are emotion intensities, out of 23453 segments 6542 of them gives indecisive classes

k2=0 for i in mosei_emotions.keys(): ... for j in mosei_emotions[i].keys(): ... if(max(mosei_emotions[i][j])==min(mosei_emotions[i][j])) or sorted(mosei_emotions[i][j],reverse=True)[0]==sorted(mosei_emotions[i][j],reverse=True)[1]: ... k2 = k2+ 1

Train Set Emotion Intensity Stats:

    0-1 = 94964
    1-2 = 3275 
    2-3 = 515 
    Max Intensity  = 3.0
    Min Intensity  = 0.0 
    Mean Intensity = 0.17
    Mean Non-Zero Intensity = 0.74
    Mean Per Emotion Intensity = [ 0.1565  0.1233  0.0401  0.4836  0.1596  0.04842]

Validation Set Emotion Intensity Stats:

    0-1 = 11031 
    1-2 = 278 
    2-3 = 37 
    Max Intensity  = 3.0
    Min Intensity  = 0.0
    Mean Intensity = 0.15
    Mean Non-Zero Intensity = 0.68
    Mean Per Emotion Intensity = [ 0.1207   0.0888  0.0436  0.4341  0.1656  0.0497]

Test Set Emotion Intensity Stats:

    0-1 = 29574 
    1-2 = 914 
    2-3 = 130 
    Max Intensity  = 3.0
    Min Intensity  = 0.0
    Mean Intensity = 0.16
    Mean Non-Zero Intensity = 0.72
    Mean Per Emotion Intensity = [ 0.1602  0.1140   0.0409   0.4685  0.1407  0.0437]

2. Words.pkl

3. Embeddings.pkl

To load the embeddings from CMU-MOSEI, run the following

build softlinks in mmdata/data/pickled/ $ cd mmdata/data/pickled/ $ ln -s ../../../*.pkl .
run this $ python3 creating_text_files-SDKload.py

Two folders will be created: text_files_segbased : segment-base embeddings text_files_videobased : segment-base embeddings, but each embedding files has a scope covering the whole video

4. Train/Test/Valid.pkl

Contains a set of all the train/test/validation video names

Length of Dataset - 3228 Videos divided into 22677 Video Clips of ~3-8 seconds
Length of Training Set - 2250 Videos divided into 16303 Video Clips
Length of Validation Set - 300 Videos divided into 1861 Video Clips
Length of Test Set - 678 Videos divided into 4645 Video Clips
Length of Truncated Set:
('train', 11112)
('test', 3303)
('val', 1341)

Train Set Video Length Stats:

    0-2 = 322
    2-4 = 2975
    4-6 = 3979
    6-8 = 3340
    8-10 = 2111
    10-15 = 2398
    15-20 = 780
    20+ = 398

Validation Set Video Length Stats:

    0-2 = 28
    2-4 = 260
    4-6 = 434
    6-8 = 400
    8-10 = 291
    10-15 = 324
    15-20 = 80
    20+ = 44

Test Set Video Length Stats:

    0-2 = 80
    2-4 = 845
    4-6 = 1246
    6-8 = 1019
    8-10 = 743
    10-15 = 689
    15-20 = 214
    20+ = 139

5. Facet.pkl

Let facet_features be = array([feature_1_val,feature_2_val,....,feature_35_val])
There are 35 features for each frame
{ "facet" :{"Video Name": {"Segment ID i_1 ": ((start_time_frame_1,end_time_frame_1,facet_features),... (start_time_frame_n,end_time_frame_n,facet_features)),"Segment ID i_2 ": ..., .... ,"Segment ID i_n ": ....}}}

6. Sentiments.pkl

7. Covarep.pkl

Let covarep_features be = array([feature_1_val,feature_2_val,....,feature_74_val])
COVAREP features are taken at a time interval of 0.01sec(10ms) which is the original sampling rate of the COVAREP authors.
There are 74 features for each 0.01 segment.
{ "facet" :{"Video Name": {"Segment ID i_1 ": ((start_time_frame_1,end_time_frame_1,covarep_features),... (start_time_frame_n,end_time_frame_n,covarep_features)),"Segment ID i_2 ": ..., .... ,"Segment ID i_n ": ....}}}

Due to 43 features in some files the following number of features were present for vision but couldn't be used for audio(and hence removed from all three folds):

Test : 1322
Train : 5105
Val : 494

TO DOs After Cloning the Repository

Put all downloaded pickle files in the same directory.
Run creating_audio_files.py using python2.7 (This might take a while depending on your machine). This will generate a folder with all the audio_files containing covarep features with the corresponding video name.
Run transfer_valid_audio.py
Run dual_attention.py

Name		Name	Last commit message	Last commit date
Latest commit History 394 Commits
gt_emotions_files		gt_emotions_files
mmdata		mmdata
models		models
random_models		random_models
.gitignore		.gitignore
README.md		README.md
checkout.py		checkout.py
cmu_score.py		cmu_score.py
creating_audio_files-SDKload.py		creating_audio_files-SDKload.py
creating_audio_files.py		creating_audio_files.py
creating_gt_emotions_files.py		creating_gt_emotions_files.py
creating_text_files-SDKalign.py		creating_text_files-SDKalign.py
creating_text_files-SDKload.py		creating_text_files-SDKload.py
creating_vision_files-SDKalign.py		creating_vision_files-SDKalign.py
creating_vision_files.py		creating_vision_files.py
dual_attention.py		dual_attention.py
dual_attention_attendtime.py		dual_attention_attendtime.py
early_concat_self_atten.py		early_concat_self_atten.py
emotions.pkl		emotions.pkl
emotions_notest.pkl		emotions_notest.pkl
gt_mosei_dataloader.py		gt_mosei_dataloader.py
late_weighting_v_a_t.py		late_weighting_v_a_t.py
mosei_dataloader.py		mosei_dataloader.py
organize_pickle.py		organize_pickle.py
random_baselines.py		random_baselines.py
sentiments.pkl		sentiments.pkl
statistics.py		statistics.py
statistics_length.py		statistics_length.py
test-dummy-file		test-dummy-file
test.pkl		test.pkl
test_load_data.py		test_load_data.py
text_only_baseline_train.py		text_only_baseline_train.py
tfn_baseline.py		tfn_baseline.py
train.pkl		train.pkl
transfer_valid_audio.py		transfer_valid_audio.py
transfer_valid_audio_multiprocessing.py		transfer_valid_audio_multiprocessing.py
triple_1024_scalarTime_gated_k1.py		triple_1024_scalarTime_gated_k1.py
triple_1024_scalarTime_gated_k3.py		triple_1024_scalarTime_gated_k3.py
triple_1024_scalarTime_gated_k3_vidbased.py		triple_1024_scalarTime_gated_k3_vidbased.py
triple_1024_scalarTime_pretrained_gated_k3.py		triple_1024_scalarTime_pretrained_gated_k3.py
triple_GRU_like_scalarTime-train-w-vision-noise.py		triple_GRU_like_scalarTime-train-w-vision-noise.py
triple_GRU_like_scalarTime.py		triple_GRU_like_scalarTime.py
triple_GRU_like_ungated_addalign.py		triple_GRU_like_ungated_addalign.py
triple_GRU_like_ungated_baseline-train-w-vision-noise.py		triple_GRU_like_ungated_baseline-train-w-vision-noise.py
triple_GRU_like_ungated_baseline.py		triple_GRU_like_ungated_baseline.py
triple_GRU_tied_scalarTime.py		triple_GRU_tied_scalarTime.py
triple_attention.py		triple_attention.py
triple_attention_1024_denseacrosstime.py		triple_attention_1024_denseacrosstime.py
triple_attention_1024_denseacrosstime_gated.py		triple_attention_1024_denseacrosstime_gated.py
triple_attention_1024_scalarTime.py		triple_attention_1024_scalarTime.py
triple_attention_1024_scalarTime_audio-ablation-gatedcontext-test-clean.py		triple_attention_1024_scalarTime_audio-ablation-gatedcontext-test-clean.py
triple_attention_1024_scalarTime_audio-ablation-gatedcontext-test-noise.py		triple_attention_1024_scalarTime_audio-ablation-gatedcontext-test-noise.py
triple_attention_1024_scalarTime_audio-ablation-gatedcontext.py		triple_attention_1024_scalarTime_audio-ablation-gatedcontext.py
triple_attention_1024_scalarTime_audio-ablation.py		triple_attention_1024_scalarTime_audio-ablation.py
triple_attention_1024_scalarTime_gated.py		triple_attention_1024_scalarTime_gated.py
triple_attention_1024_scalarTime_pretrained.py		triple_attention_1024_scalarTime_pretrained.py
triple_attention_1024_scalarTime_vision-ablation-gatedcontext-test-clean.py		triple_attention_1024_scalarTime_vision-ablation-gatedcontext-test-clean.py
triple_attention_1024_scalarTime_vision-ablation-gatedcontext-test-noise.py		triple_attention_1024_scalarTime_vision-ablation-gatedcontext-test-noise.py
triple_attention_1024_scalarTime_vision-ablation-gatedcontext.py		triple_attention_1024_scalarTime_vision-ablation-gatedcontext.py
triple_attention_attendtime.py		triple_attention_attendtime.py
triple_attention_attendtime_noP.py		triple_attention_attendtime_noP.py
triple_both_gates.py		triple_both_gates.py
triple_mul_vatagreegate_attention_1024_scalarTime.py		triple_mul_vatagreegate_attention_1024_scalarTime.py
triple_mulgatedvatattention_1024_scalarTime.py		triple_mulgatedvatattention_1024_scalarTime.py
triple_orig_init_k2_GRU_baseline.py		triple_orig_init_k2_GRU_baseline.py
triple_outergated.py		triple_outergated.py
utils.py		utils.py
utils_modified.py		utils_modified.py
valid.pkl		valid.pkl
verbal_only.py		verbal_only.py
vision_only.py		vision_only.py
visualizing_audio_files.py		visualizing_audio_files.py
vocal_only.py		vocal_only.py
words.pkl		words.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download Links

Baselines and Metrics:

Attention with sum_{t}{av} instead of mean(av) and m+a+v+t instead of m+(avt)

Attention with sum_{t}{av} instead of mean(av)

Old attention (downweighting memory update by mean(0).unsqueeze(0) instead of sum(0) )

Scoring

Structure of Pickle Files

1. Emotions.pkl

Train Set Emotion Intensity Stats:

Validation Set Emotion Intensity Stats:

Test Set Emotion Intensity Stats:

2. Words.pkl

3. Embeddings.pkl

4. Train/Test/Valid.pkl

Train Set Video Length Stats:

Validation Set Video Length Stats:

Test Set Video Length Stats:

5. Facet.pkl

6. Sentiments.pkl

7. Covarep.pkl

TO DOs After Cloning the Repository

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

codeislife99/Multimodal_Emotion_Analysis

Folders and files

Latest commit

History

Repository files navigation

Download Links

Baselines and Metrics:

Attention with sum_{t}{av} instead of mean(av) and m+a+v+t instead of m+(avt)

Attention with sum_{t}{av} instead of mean(av)

Old attention (downweighting memory update by mean(0).unsqueeze(0) instead of sum(0) )

Scoring

Structure of Pickle Files

1. Emotions.pkl

Train Set Emotion Intensity Stats:

Validation Set Emotion Intensity Stats:

Test Set Emotion Intensity Stats:

2. Words.pkl

3. Embeddings.pkl

4. Train/Test/Valid.pkl

Train Set Video Length Stats:

Validation Set Video Length Stats:

Test Set Video Length Stats:

5. Facet.pkl

6. Sentiments.pkl

7. Covarep.pkl

TO DOs After Cloning the Repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages