Top 4% solution for Freesound General-purpose Audio tagging Challenge

This was my first time to handle sound classification problem because I missed a similar competition from kaggle. It is sad that I didn't understand this is a research competition which means there was no medals althrough my solution ranked 19th. I think that was why not so many people joint it. But after all, I learnt to deal similar problems.

Bad begin

It was natual to use RNN architectures since sound is a time serials signal, plus the data provided has different length which can only be handled in a whole by RNNs. But I got no good results after trying LSTMs and GRUs no matter how deep they were.

Then I tried to trim the sounds to the first 5 seconds and input them into a 1D CNNs. Then I even tried to transform the sound into mfcc 2d images and trained a 2d classifier on them. Still, none of above methods got really god result.

After search on internet and the kaggle forum, I got no good new ideas.

Master's Insight

The blog from kaggle's global 1st bestfitting who got the 1st place in only 1 year gave me a good suggest: Read up all best solutions from the previous competitions. This is a really gold suggestion which turns kaggle from a competitions platform into the best machine learning library in the world. I found a very similar competition just a few month ago named TensorFlow Speech Recognition Challenge in which the winner turned the sounds into mel spectogram 2d images and then trained a image classifier on them.

I followed this method and got a better score in leaderboard. I got even better scores after I started training from pretrained weights from keras. From this competitions, I really began to realize that the quality of the starting pretrained weights matters a lot in all kinds of competitions.

Stack Them All

The best solution from me is to train a big network contained as many models from many formats of sound including:

Image classifier on mel spectograms
Image classifier mfcc spectograms;
1D CNNs on first 5 seconds;
GBTs on statistical features from sounds;

I really had fun in doing this since it is my first time to implement stack and it really worked.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Autoencoder.ipynb		Autoencoder.ipynb
EDA.ipynb		EDA.ipynb
README.md		README.md
analyze.ipynb		analyze.ipynb
analyze.py		analyze.py
argudf.py		argudf.py
audio_1d.ipynb		audio_1d.ipynb
audio_1d.py		audio_1d.py
audio_2d.py		audio_2d.py
audio_rnn.py		audio_rnn.py
autoencoder.py		autoencoder.py
check_preds.ipynb		check_preds.ipynb
cnn2d.py		cnn2d.py
dataset.py		dataset.py
ensemble.ipynb		ensemble.ipynb
gen_cache.py		gen_cache.py
kaggle_util.py		kaggle_util.py
lgb.ipynb		lgb.ipynb
lgb.py		lgb.py
mel_hibernate.sh		mel_hibernate.sh
mel_train.py		mel_train.py
mixup.ipynb		mixup.ipynb
original.py		original.py
round.txt		round.txt
seresnet.py		seresnet.py
spectual.ipynb		spectual.ipynb
train.log		train.log
train_hibernate.sh		train_hibernate.sh
transform.py		transform.py
trim.ipynb		trim.ipynb
util.py		util.py
xgb.py		xgb.py
xgb_mel.py		xgb_mel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Top 4% solution for Freesound General-purpose Audio tagging Challenge

Bad begin

Master's Insight

Stack Them All

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Top 4% solution for Freesound General-purpose Audio tagging Challenge

Bad begin

Master's Insight

Stack Them All

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages