Eugene Laksana, Sayan Ghosh, Stefan Scherer
University of Southern California, Institute for Creative Technologies
A python-based neural network batch iterator with multimodal support. It works by first splitting the dataset into separate chunks and designating concurrent workers to populate a queue from their assigned chunks. A single process is responsible for retrieving batches from this queue via a get_batch function. Currently, this version has both numerical and lexical support.
- Python 2.7x
- numpy
- h5py
The iterator currently only supports the hdf5 file format. As an input parameter, FastBatch takes in the path to a file list which contains paths to the hdf5 files separated by newline characters. I have provided some functional examples of our input format under the ex folder.
There are currently two working versions of fastbatch: fastbatch and fastbatch_term.
from fastbatch import fastbatch
import time, sys, os
if __name__ == '__main__':
file_list_name = './ex/file_list.txt'
feat_dsname = 'openSMILE_features'
word_dsname = 'words'
new_file_list_prefix = './file_list_'
k = 3
batch_size = 20
num_timesteps = 20
shift_step = 1
pdi = fastbatch(file_list_name, feat_dsname, word_dsname, new_file_list_prefix, k, batch_size, num_timesteps, shift_step, feat_list = ['f0', 'shimmer', 'jitter', 'voicing', 'rmsenergy'], remove_sp = True)
time_count = 0
st = time.time()
for i in range(0, 100):
sys.stdout.flush()
a,b,c,d = pdi.get_batch()
if i % 10 == 0:
ed = time.time()
print str(i) + 'th batch: ' + str(ed - st)
ed = time.time()
print ed - st
sys.stdout.flush()