Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
c3edf17
init MXDataset and python test
zhreshold Jan 14, 2020
bf25545
revert makefile
zhreshold Jan 14, 2020
8386f03
fix python binder
zhreshold Jan 15, 2020
8098e77
fix ImageSequenceDataset
zhreshold Jan 15, 2020
2e62c22
add NDArrayDataset and TupleDataset
zhreshold Jan 15, 2020
6f809c5
use attr handle rather than NDArraybase
zhreshold Jan 15, 2020
e92518f
fix image copy
zhreshold Jan 16, 2020
b925d17
fix deleted ndarray reference in python
zhreshold Jan 16, 2020
b410909
fix mismatch ndarray shape and scalar case
zhreshold Jan 17, 2020
20046ee
create __handle__ only when required
zhreshold Jan 17, 2020
d963cb3
remove mnist data
zhreshold Jan 17, 2020
04ead8e
init sampler and dataloader
zhreshold Jan 17, 2020
34ddf7b
fix samplers
zhreshold Jan 17, 2020
36621b9
fix sampler registration
zhreshold Jan 18, 2020
4b01028
backup changes
zhreshold Jan 21, 2020
a55c032
add randomsamplerIter
zhreshold Jan 22, 2020
d41af7c
fix sampler and batch sampler, need to fix first batch cache
zhreshold Jan 23, 2020
ca00dc4
fix prefetching caused missing data
zhreshold Jan 24, 2020
83a3ad2
add registry for batchify functions
zhreshold Jan 24, 2020
ccd4063
fix dataset init, make dataset const
zhreshold Jan 25, 2020
c0bc373
add python binding for threadedDataLoader
zhreshold Jan 25, 2020
4f29029
cache c++ datasets, samplers and batchify_fns
zhreshold Jan 25, 2020
9f36a20
add unittest
zhreshold Jan 25, 2020
2661677
fix getting all data from DataBatch
zhreshold Jan 26, 2020
913eece
fix first batch and clean up
zhreshold Jan 26, 2020
89432f3
fix datloader all together
zhreshold Jan 27, 2020
58d9fe3
fix last_batch=keep padding
zhreshold Jan 27, 2020
a4aaf9a
fix scalar batched data
zhreshold Jan 27, 2020
0cd97d9
MXDataIter ndarray type np/nd
zhreshold Jan 27, 2020
7ac0147
change Dataset GetItem to return vector
zhreshold Jan 28, 2020
a2e9ac1
fix capi name
zhreshold Jan 28, 2020
fae205d
add lazy transform dataset
zhreshold Jan 28, 2020
ce5a3e2
lazy transform syntax suger
zhreshold Jan 28, 2020
4acffda
wrap up syntax suger with identical dataloader support
zhreshold Jan 28, 2020
389eb38
add scope controller for convenience
zhreshold Jan 28, 2020
78aec77
fix compilation
zhreshold Jan 29, 2020
9c9cf0e
add gluon.data.Datasets to supported, add unittests
zhreshold Jan 29, 2020
936eb4b
cifar training successful
zhreshold Jan 30, 2020
e729a8f
fix transform_first closure with mxhandle
zhreshold Jan 30, 2020
e686d9b
try duplicate cachedop
zhreshold Feb 6, 2020
7082b4e
backup
zhreshold Feb 6, 2020
1326d48
fix random crop op
zhreshold Feb 8, 2020
49c99e5
add batchify capi
zhreshold Feb 11, 2020
9f487e2
fix batchify c_api
zhreshold Feb 13, 2020
85e1bdb
fix pickle batchify functions
zhreshold Feb 13, 2020
3eb0813
add batchify functions
zhreshold Feb 14, 2020
827e017
add appendBatchify
zhreshold Feb 14, 2020
28b2c42
fix batchify input array
zhreshold Feb 14, 2020
000b628
add RecordFileDataset in backend
zhreshold Feb 15, 2020
0954edf
add ImageRecordDataset to backend
zhreshold Feb 15, 2020
bad8417
fix ImageRecordFile decoding
zhreshold Feb 18, 2020
9811a91
fix prefetcher with dynamic shape arrays
zhreshold Feb 19, 2020
ec9bca4
add more tests
zhreshold Feb 20, 2020
5d146e6
fix potential dangerous index accessing
zhreshold Feb 20, 2020
10ef8ad
add arguments rather than scope
zhreshold Feb 20, 2020
6f53af6
add tests for nopython mode
zhreshold Feb 20, 2020
73c8697
try add bbox transforms
zhreshold Feb 21, 2020
28c6316
add PadBatchify, WIP for pad axis
zhreshold Feb 22, 2020
879252e
fix padding batchify, add unittests
zhreshold Feb 25, 2020
5139449
allow non boolean nopython flag
zhreshold Feb 27, 2020
42f3f5f
add operator for random_resized_crop
zhreshold Feb 28, 2020
fa9a8b9
fix cuda names
zhreshold Feb 28, 2020
4c1300d
fix reference before assignment for area
zhreshold Feb 28, 2020
b67ecf4
fix try_nopython behavior
zhreshold Feb 28, 2020
ec0a1f1
backup before reorg cached op
zhreshold Feb 28, 2020
09b03b9
add naive forward mode for cached op
zhreshold Feb 29, 2020
f88f514
fix dataloader internal dataset indexing
zhreshold Feb 29, 2020
cec1ac3
fix
zhreshold Feb 29, 2020
1358787
add skip_engine
zhreshold Mar 1, 2020
660352f
add resize check
zhreshold Mar 1, 2020
4d381c3
fix center_crop
zhreshold Mar 1, 2020
00f825b
allow 0 cropping bound
zhreshold Mar 1, 2020
7443ca0
allow 0 cropping bound
zhreshold Mar 1, 2020
0034f41
add profiler to dataloader
zhreshold Mar 1, 2020
2c5b408
omp batchify
zhreshold Mar 1, 2020
7cc635b
fix missing imagerecorddataset flag
zhreshold Mar 1, 2020
349b0aa
fix missing imagerecorddataset flag
zhreshold Mar 1, 2020
16dbfef
reorg c++ io
zhreshold Mar 2, 2020
a3a1c95
revert
zhreshold Mar 2, 2020
ffdfff0
remove engine copy
zhreshold Mar 3, 2020
d66ef96
fix test errors
zhreshold Mar 4, 2020
0f79fca
adding augmentations into vision.transforms
zhreshold Mar 4, 2020
42eca6e
reorg gluon vision transforms
zhreshold Mar 7, 2020
cfe291a
start bbox transforms
zhreshold Mar 7, 2020
0364a1d
add bbox utils
zhreshold Mar 10, 2020
a2ff894
Merge remote-tracking branch 'upstream/master' into cpp_data
zhreshold Mar 10, 2020
ae1391e
revert ndarray contrib.py
zhreshold Mar 10, 2020
56bb1c5
add more bbox util funcs
zhreshold Mar 10, 2020
ade29f3
add numpy vision test
zhreshold Mar 11, 2020
dc049af
fix unittests
zhreshold Mar 11, 2020
bed2608
reorg naive cached op
zhreshold Mar 12, 2020
421f81d
Merge remote-tracking branch 'origin/cpp_data' into cpp_dataset
zhreshold Mar 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions include/mxnet/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,14 @@ typedef void *ExecutorHandle;
typedef void *DataIterCreator;
/*! \brief handle to a DataIterator */
typedef void *DataIterHandle;
/*! \brief handle a dataset creator */
typedef void *DatasetCreator;
/*! \brief handle to a Dataset */
typedef void *DatasetHandle;
/*! \brief handle to a BatchifyFunction creator*/
typedef void *BatchifyFunctionCreator;
/*! \brief handle to a BatchifyFunction */
typedef void *BatchifyFunctionHandle;
/*! \brief handle to KVStore */
typedef void *KVStoreHandle;
/*! \brief handle to RecordIO */
Expand Down Expand Up @@ -2619,6 +2627,13 @@ MXNET_DLL int MXDataIterNext(DataIterHandle handle,
*/
MXNET_DLL int MXDataIterBeforeFirst(DataIterHandle handle);

/*!
* \brief Call iterator.GetLenHint. Note that some iterators don't provide length.
* \param handle the handle to iterator
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDataIterGetLenHint(DataIterHandle handle,
int64_t *len);
/*!
* \brief Get the handle to the NDArray of underlying data
* \param handle the handle pointer to the data iterator
Expand Down Expand Up @@ -2654,6 +2669,147 @@ MXNET_DLL int MXDataIterGetPadNum(DataIterHandle handle,
*/
MXNET_DLL int MXDataIterGetLabel(DataIterHandle handle,
NDArrayHandle *out);
/*!
* \brief Get the handles to specified underlying ndarrays of index
* \param handle the handle pointer to the data iterator
* \param num_outputs the length of outputs
* \param out the handle to an array of NDArrays that stores pointers to handles
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDataIterGetItems(DataIterHandle handle,
int* num_outputs,
NDArrayHandle **outputs);

/*!
* \brief List all the available dataset entries
* \param out_size the size of returned datasets
* \param out_array the output dataset entries
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXListDatasets(uint32_t *out_size,
DatasetCreator **out_array);
/*!
* \brief Init an dataset, init with parameters
* the array size of passed in arguments
* \param handle of the dataset creator
* \param num_param number of parameter
* \param keys parameter keys
* \param vals parameter values
* \param out resulting dataset
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDatasetCreateDataset(DatasetCreator handle,
uint32_t num_param,
const char **keys,
const char **vals,
DatasetHandle *out);
/*!
* \brief Get the detailed information about dataset.
* \param creator the DatasetCreator.
* \param name The returned name of the creator.
* \param description The returned description of the symbol.
* \param num_args Number of arguments.
* \param arg_names Name of the arguments.
* \param arg_type_infos Type informations about the arguments.
* \param arg_descriptions Description information about the arguments.
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDatasetGetDatasetInfo(DatasetCreator creator,
const char **name,
const char **description,
uint32_t *num_args,
const char ***arg_names,
const char ***arg_type_infos,
const char ***arg_descriptions);
/*!
* \brief Free the handle to the IO module
* \param handle the handle pointer to the dataset
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDatasetFree(DatasetHandle handle);
/*!
* \brief Get dataset overal length(size)
* \param handle the handle to dataset
* \param out return value of GetLen
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDatasetGetLen(DatasetHandle handle,
uint64_t *out);
/*!
* \brief Get Output NDArray given specified indices
* \param handle the handle to dataset
* \param index the index of the dataset item to be retrieved
* \param num_outputs the number of output ndarrays
* \param outputs the pointers to handles of ndarrays
* \param is_scalar if not zeros then output should be casted to scalars
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXDatasetGetItems(DatasetHandle handle,
uint64_t index,
int* num_outputs,
NDArrayHandle **outputs);

/*!
* \brief List all the available batchify function entries
* \param out_size the size of returned batchify functions
* \param out_array the output batchify function entries
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXListBatchifyFunctions(uint32_t *out_size,
BatchifyFunctionCreator **out_array);
/*!
* \brief Init an batchify function, init with parameters
* the array size of passed in arguments
* \param handle of the batchify function creator
* \param num_param number of parameter
* \param keys parameter keys
* \param vals parameter values
* \param out resulting batchify function
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXBatchifyFunctionCreateFunction(BatchifyFunctionCreator handle,
uint32_t num_param,
const char **keys,
const char **vals,
BatchifyFunctionHandle *out);
/*!
* \brief Get the detailed information about batchify function.
* \param creator the batchifyFunctionCreator.
* \param name The returned name of the creator.
* \param description The returned description of the symbol.
* \param num_args Number of arguments.
* \param arg_names Name of the arguments.
* \param arg_type_infos Type informations about the arguments.
* \param arg_descriptions Description information about the arguments.
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXBatchifyFunctionGetFunctionInfo(BatchifyFunctionCreator creator,
const char **name,
const char **description,
uint32_t *num_args,
const char ***arg_names,
const char ***arg_type_infos,
const char ***arg_descriptions);
/*!
* \brief Invoke the Batchify Function
* \param handle the handle pointer to the batchify function
* \param batch_size the batch size
* \param num_output the number of ndarrays for output
* \param inputs the pointers to input ndarrays
* \param ouptuts the pointers to output ndarrays
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXBatchifyFunctionInvoke(BatchifyFunctionHandle handle,
int batch_size,
int num_output,
NDArrayHandle *inputs,
NDArrayHandle **outputs);
/*!
* \brief Free the handle to the IO module
* \param handle the handle pointer to the batchify function
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXBatchifyFunctionFree(BatchifyFunctionHandle handle);
//--------------------------------------------
// Part 6: basic KVStore interface
//--------------------------------------------
Expand Down
109 changes: 108 additions & 1 deletion include/mxnet/io.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,13 @@ class IIterator : public dmlc::DataIter<DType> {
inline void SetDataName(const std::string data_name) {
data_names.push_back(data_name);
}
/*! \brief request iterator length hint for current epoch.
* Note that the returned value can be < 0, indicating
* that the length of iterator is unknown unless you went through all data.
*/
virtual int64_t GetLenHint(void) const {
return -1;
}
}; // class IIterator

/*! \brief a single data instance */
Expand Down Expand Up @@ -104,7 +111,7 @@ struct DataIteratorReg
*
* \code
* // example of registering a mnist iterator
* REGISTER_IO_ITE(MNISTIter)
* REGISTER_IO_ITER(MNISTIter)
* .describe("Mnist data iterator")
* .set_body([]() {
* return new PrefetcherIter(new MNISTIter());
Expand All @@ -113,5 +120,105 @@ struct DataIteratorReg
*/
#define MXNET_REGISTER_IO_ITER(name) \
DMLC_REGISTRY_REGISTER(::mxnet::DataIteratorReg, DataIteratorReg, name)

/*!
* \brief A random accessable dataset which provides GetLen() and GetItem().
* Unlike DataIter, it's a static lookup storage which is friendly to random access.
* The dataset itself should NOT contain data processing, which should be applied during
* data augmentation or transformation processes.
*/
class Dataset {
public:
/*!
* \brief Initialize the Operator by setting the parameters
* This function need to be called before all other functions.
* \param kwargs the keyword arguments parameters
*/
virtual void Init(const std::vector<std::pair<std::string, std::string> >& kwargs) = 0;
/*!
* \brief Get the size of the dataset
*/
virtual uint64_t GetLen(void) const = 0;
/*!
* \brief Create a copy of dataset for threaded worker
*/
virtual Dataset* Clone(void) const = 0;
/*!
* \brief Get the ndarray items given index in dataset
* \param idx the integer index for required data
* \param ret the returned ndarray items
*/
virtual bool GetItem(uint64_t idx, std::vector<NDArray>& ret) = 0;
// virtual destructor
virtual ~Dataset(void) {}
}; // class Dataset

using DatasetPtr = std::shared_ptr<Dataset>;

/*! \brief typedef the factory function of dataset */
typedef std::function<Dataset *()> DatasetFactory;
/*!
* \brief Registry entry for Dataset factory functions.
*/
struct DatasetReg
: public dmlc::FunctionRegEntryBase<DatasetReg,
DatasetFactory> {
};
//--------------------------------------------------------------
// The following part are API Registration of Datasets
//--------------------------------------------------------------
/*!
* \brief Macro to register Datasets
*
* \code
* // example of registering an image sequence dataset
* REGISTER_IO_ITE(ImageSequenceDataset)
* .describe("image sequence dataset")
* .set_body([]() {
* return new ImageSequenceDataset();
* });
* \endcode
*/
#define MXNET_REGISTER_IO_DATASET(name) \
DMLC_REGISTRY_REGISTER(::mxnet::DatasetReg, DatasetReg, name)

class BatchifyFunction {
public:
/*! \brief Destructor */
virtual ~BatchifyFunction(void) {};
/*! \brief Init */
virtual void Init(const std::vector<std::pair<std::string, std::string> >& kwargs) = 0;
/*! \brief The batchify logic */
virtual bool Batchify(std::vector<std::vector<NDArray> >& inputs, std::vector<NDArray>& outputs) = 0;
}; // class BatchifyFunction

using BatchifyFunctionPtr = std::shared_ptr<BatchifyFunction>;

/*! \brief typedef the factory function of data sampler */
typedef std::function<BatchifyFunction *()> BatchifyFunctionFactory;
/*!
* \brief Registry entry for DataSampler factory functions.
*/
struct BatchifyFunctionReg
: public dmlc::FunctionRegEntryBase<BatchifyFunctionReg,
BatchifyFunctionFactory> {
};
//--------------------------------------------------------------
// The following part are API Registration of Batchify Function
//--------------------------------------------------------------
/*!
* \brief Macro to register Batchify Functions
*
* \code
* // example of registering a Batchify Function
* MXNET_REGISTER_IO_BATCHIFY_FUNCTION(StackBatchify)
* .describe("Stack Batchify Function")
* .set_body([]() {
* return new StackBatchify();
* });
* \endcode
*/
#define MXNET_REGISTER_IO_BATCHIFY_FUNCTION(name) \
DMLC_REGISTRY_REGISTER(::mxnet::BatchifyFunctionReg, BatchifyFunctionReg, name)
} // namespace mxnet
#endif // MXNET_IO_H_
2 changes: 2 additions & 0 deletions python/mxnet/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,8 @@ def _load_lib():
ExecutorHandle = ctypes.c_void_p
DataIterCreatorHandle = ctypes.c_void_p
DataIterHandle = ctypes.c_void_p
DatasetHandle = ctypes.c_void_p
BatchifyFunctionhandle = ctypes.c_void_p
KVStoreHandle = ctypes.c_void_p
RecordIOHandle = ctypes.c_void_p
RtcHandle = ctypes.c_void_p
Expand Down
20 changes: 20 additions & 0 deletions python/mxnet/gluon/contrib/data/vision/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# coding: utf-8
# pylint: disable=wildcard-import
"""Contrib vision utilities."""
49 changes: 49 additions & 0 deletions python/mxnet/gluon/contrib/data/vision/transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# coding: utf-8
# pylint: disable=wildcard-import
"""Contrib vision trasforms."""
import random
from ....block import Block, HybridBlock
from ....nn import Sequential, HybridSequential
from ..... import image
from .....base import numeric_types
from .....util import is_np_array


class BBoxRandomFlipLeftRight(HybridBlock):
"""Randomly flip the input image left to right with a probability
of 0.5.

Inputs:
- **data**: input tensor with (H x W x C) shape.

Outputs:
- **out**: output tensor with same shape as `data`.
"""
def __init__(self, prob=0.5):
super(BBoxRandomFlipLeftRight, self).__init__()
self.prob = prob

def hybrid_forward(self, F, x, y):
if is_np_array():
width = F.npx.shape_array(x).split(3)[1]
cond = F.np.random.uniform(low=0, high=1, size=1) < self.prob
x = F.np.where(cond, F.npx.image.flip_left_right(x), x)
else:
raise NotImplementedError('Not implemented for non-np mode')
2 changes: 2 additions & 0 deletions python/mxnet/gluon/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@
from .dataloader import *

from . import vision

from . import _internal
Loading