Trains low-rank CNNs from raw speech using Keras/Tensorflow, with inputs from Kaldi directories.
-
Trains CNNs from Kaldi GMM system
-
Works with standard Kaldi data and alignment directories
-
Decodes test utterances in Kaldi style
-
Python 3.4+
-
Keras with Tensorflow/Theano backend
-
Kaldi (obtained using git)
-
Train a GMM system in Kaldi.
-
Place steps_kt and run_*.sh in the working directory.
-
Apply the patch compute-raw-feats.patch to Kaldi. To do this:
. ./path.sh ## To get $KALDI_ROOT environment variable. mv compute-raw-feats.patch $KALDI_ROOT/ cd $KALDI_ROOT/ git apply compute-raw-feats.patch cd src/ make depend ## [-j 4] make ## [-j 4]Note: This creates a new executable compute-raw-feats in src/featbin/ directory of Kaldi. It does not alter any of the existing Kaldi tools.
-
Extract raw features using extract.sh.
-
Configure and run run_*.sh. run_rawcnn.sh trains triphone models. Provide model architecture as an argument. See steps_kt/model_architecture.py for valid options. Optionally, provide a CNN directory to initialise the model weights from. The model architecture is expected to be the same, except the output layer. This feature is useful to initialise a triphone CNN from a monophone CNN. run_rawcnn_mono.sh trains monophone models. Model architecture is its only argument. After training a CNN, it computes forced alignments and re-trains them. This expectation-maximisation is performed for two iterations to get a better modelling.
-
train*_rawcnn.py is the Keras training script.
-
Model architecture can be configured in model_architecture.py.
-
dataGeneratorSRaw.py provides an object that reads Kaldi data and alignment directories in batches and retrieves mini-batches for training.
-
nnet-forward-norm-arch.py passes test features through the trained CNNs and outputs log posterior probabilities in Kaldi format.
-
kaldiIO.py reads and writes Kaldi-type binary features.
-
decode_norm_arch.py is the decoding script.
-
align_norm_arch.sh is the alignment script.
-
compute_priors.py computes priors.
The script uses stochastic gradient descent with 0.5 momentum. It starts with a learning rate of 0.1 for a minimum of 5 epochs. Whenever the validation loss reduces by less than 0.002 between successive epochs, the learning rate is halved. Halving is performed for a total of 18 times.
Idiap Research Institute
Authors: S. Pavankumar Dubagunta, Vinayak Abrol and Mathew Magimai-Doss.
GNU GPL v3