FastBatch: A Multimodal Parallelized Neural Network Iterator for Python 2.7

Eugene Laksana, Sayan Ghosh, Stefan Scherer
University of Southern California, Institute for Creative Technologies

Description

A python-based neural network batch iterator with multimodal support. It works by first splitting the dataset into separate chunks and designating concurrent workers to populate a queue from their assigned chunks. A single process is responsible for retrieving batches from this queue via a get_batch function. Currently, this version has both numerical and lexical support.

Requirements

Python 2.7x
numpy
h5py

Data Input Format

The iterator currently only supports the hdf5 file format. As an input parameter, FastBatch takes in the path to a file list which contains paths to the hdf5 files separated by newline characters. I have provided some functional examples of our input format under the ex folder.

Usage

There are currently two working versions of fastbatch: fastbatch and fastbatch_term.

fastbatch: continues to populate the concurrent queue indefinitely by looping over the file list.

fastbatch_term: returns a None tuple after the last element in the file list has been reached.

Both versions employ the same parameters:

Mandatory

file_list_name: path to the file list.

feat_dsname: the dataset name of the numerical features (ie. openSMILE_features.)

word_dsname: the dataset name of the lexical features

new_file_list_prefix: prefix for the file list copies that will be generated for each process to work on.

k: number of chunks to make. (will assign k workers to work on k chunks of the file list and populate the queue from their respective chunks.)

batch_size: number of datapoints per batch.

num_timesteps: number of batches desires at a time.

Optional

shift_step: number of shifts requested for the shifted array (default: 1)

word_to_id: Sayan's parameter. (default: None)

feat_list: This only works with our modified OpenSmile IS11 features. Leave at None if irrelevant (default: None)

remove_sp: removes sp, which are non-word sounds as labeled in the Fisher dataset. Leave false if irrelevant (default: False)

Return

orig_word_mat: original word matrix

shift_word_mat: shifted word matrix

feat_mat: feature matrix

hdf5_filepath_mat: hdf5 filepaths from which the original word matrix came from.

Sample Usage

from fastbatch import fastbatch
import time, sys, os

if __name__ == '__main__':
	file_list_name = './ex/file_list.txt'
	feat_dsname = 'openSMILE_features'
	word_dsname = 'words'
	new_file_list_prefix = './file_list_'
	k = 3
	batch_size = 20
	num_timesteps = 20
	shift_step = 1
	pdi = fastbatch(file_list_name, feat_dsname, word_dsname, new_file_list_prefix, k, batch_size, num_timesteps, shift_step, feat_list = ['f0', 'shimmer', 'jitter', 'voicing', 'rmsenergy'], remove_sp = True)
	time_count = 0
	st = time.time()
	for i in range(0, 100):
		sys.stdout.flush()
		a,b,c,d = pdi.get_batch()
		if i % 10 == 0:
			ed = time.time()
			print str(i) + 'th batch: ' + str(ed - st)
	ed = time.time()
	print ed - st
	sys.stdout.flush()

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ex		ex
img		img
src		src
Copywrite.txt		Copywrite.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastBatch: A Multimodal Parallelized Neural Network Iterator for Python 2.7

Eugene Laksana, Sayan Ghosh, Stefan Scherer
University of Southern California, Institute for Creative Technologies

Description

Requirements

Data Input Format

Usage

Mandatory

Optional

Return

Sample Usage

About

Uh oh!

Releases

Packages

Languages

elaksana/FastBatch

Folders and files

Latest commit

History

Repository files navigation

FastBatch: A Multimodal Parallelized Neural Network Iterator for Python 2.7

Eugene Laksana, Sayan Ghosh, Stefan Scherer University of Southern California, Institute for Creative Technologies

Description

Requirements

Data Input Format

Usage

Mandatory

Optional

Return

Sample Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Eugene Laksana, Sayan Ghosh, Stefan Scherer
University of Southern California, Institute for Creative Technologies

Packages