Pascal TitanX too many resources requested for launch

## Environment info
Operating System:
Ubuntu 16.04

Compiler:
GCC 5.4

Package used (Python/R/Scala/Julia):
Python

Or if installed from source:

MXNet commit hash (`git rev-parse HEAD`):
0418aae16c2c6a01bf2e937d6e05596ec21e9087
8713d257cde97a660a459aa8a50a780944cf823c (0.10 release)

Python version and distribution:
2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609]

## Error Message:

[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/src/executor/graph_executor.cc:558: Bucketing: data gt_boxes has a shape (1,123,5), which is larger than already allocated shape (1,100,5). Need to re-allocate. Consider putting default bucket key to be the bucket taking the largest input for better memory sharing.
[19:20:36] /code/mxnet/dmlc-core/include/dmlc/logging.h:304: [19:20:36] /code/mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (7 vs. 0) Name: MapPlanKernel ErrStr:too many resources requested for launch

Stack trace returned 9 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f81998b95dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda7MapPlanINS_2sv6plustoENS_6TensorINS_3gpuELi2EfEENS_4expr14Broadcast1DExpINS4_IS5_Li1EfEEfLi2ELi1EEEfEEvNS7_4PlanIT0_T2_EERKNSB_IT1_SD_EENS_5ShapeILi2EEEP11CUstream_st+0x1bc) [0x7f819a61351c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op16FullyConnectedOpIN7mshadow3gpuEfE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD_+0x972) [0x7f819a614062]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(+0x6f7c19) [0x7f8199949c19]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f819992b337]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x78) [0x7f819992fab8]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f81eb1d2c80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f81ef2326ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f81eef6882d]

[19:20:36] /code/mxnet/dmlc-core/include/dmlc/logging.h:304: [19:20:36] /code/mxnet/src/engine/./threaded_engine.h:329: [19:20:36] /code/mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (7 vs. 0) Name: MapPlanKernel ErrStr:too many resources requested for launch

Stack trace returned 9 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f81998b95dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda7MapPlanINS_2sv6plustoENS_6TensorINS_3gpuELi2EfEENS_4expr14Broadcast1DExpINS4_IS5_Li1EfEEfLi2ELi1EEEfEEvNS7_4PlanIT0_T2_EERKNSB_IT1_SD_EENS_5ShapeILi2EEEP11CUstream_st+0x1bc) [0x7f819a61351c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op16FullyConnectedOpIN7mshadow3gpuEfE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD_+0x972) [0x7f819a614062]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(+0x6f7c19) [0x7f8199949c19]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f819992b337]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x78) [0x7f819992fab8]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f81eb1d2c80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f81ef2326ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f81eef6882d]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f81998b95dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x31a) [0x7f819992b5ca]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x78) [0x7f819992fab8]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f81eb1d2c80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f81ef2326ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f81eef6882d]

terminate called after throwing an instance of 'dmlc::Error'
  what():  [19:20:36] /code/mxnet/src/engine/./threaded_engine.h:329: [19:20:36] /code/mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (7 vs. 0) Name: MapPlanKernel ErrStr:too many resources requested for launch

Stack trace returned 9 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f81998b95dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda7MapPlanINS_2sv6plustoENS_6TensorINS_3gpuELi2EfEENS_4expr14Broadcast1DExpINS4_IS5_Li1EfEEfLi2ELi1EEEfEEvNS7_4PlanIT0_T2_EERKNSB_IT1_SD_EENS_5ShapeILi2EEEP11CUstream_st+0x1bc) [0x7f819a61351c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2op16FullyConnectedOpIN7mshadow3gpuEfE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD_+0x972) [0x7f819a614062]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(+0x6f7c19) [0x7f8199949c19]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f819992b337]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x78) [0x7f819992fab8]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f81eb1d2c80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f81ef2326ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f81eef6882d]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f81998b95dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x31a) [0x7f819992b5ca]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x78) [0x7f819992fab8]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f81eb1d2c80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f81ef2326ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f81eef6882d]

## Minimum reproducible example
`

import argparse
import pprint
import mxnet as mx
import numpy as np
import glob
import sys
sys.path.append('/code/mxnet/example/rcnn/')
from rcnn.logger import logger
from rcnn.config import config, default, generate_config
from rcnn.symbol import *
from rcnn.core import callback, metric
from rcnn.core.loader import AnchorLoader
from rcnn.core.module import MutableModule
from rcnn.utils.load_data import load_gt_roidb, merge_roidb, filter_roidb
from rcnn.utils.load_model import load_param
from rcnn.dataset.imdb import IMDB
import xmltodict
from PIL import Image
import cPickle
import os
classes = ['human--person', 'human--rider--bicyclist', 'human--rider--motorcyclist', 
                        'human--rider--other-rider', 'object--pothole', 'object--street-light', 'object--traffic-light', 
                        'object--traffic-sign--back', 'object--traffic-sign--front', 'object--vehicle--bicycle', 
                        'object--vehicle--boat', 'object--vehicle--bus', 'object--vehicle--car', 
                        'object--vehicle--caravan', 'object--vehicle--motorcycle', 'object--vehicle--on-rails', 
                        'object--vehicle--other-vehicle', 'object--vehicle--trailer', 'object--vehicle--truck', 
                        'object--vehicle--wheeled-slow']


class mapillary(IMDB):
    def __init__(self, classes, image_set='training', root_path='./', data_path='./'):
        super(mapillary, self).__init__('mapillary', image_set, root_path, data_path)
        self.root_path=root_path
        self.image_set = image_set
        self.data_path=data_path
        self.classes = ['void'] + classes
        self.num_classes = len(self.classes)
        self.image_files = glob.glob(data_path + image_set + '/images/*')
        self.num_images = len(self.image_files)
        label_files = glob.glob(data_path + 'pre-processed-for-training/pascal_ssd/' + image_set + '/*')
        self.label_files = {}
        for lbl in label_files:
            self.label_files[os.path.splitext(os.path.basename(lbl))[0]] = lbl
        self.image_set_index = self.load_image_set_index()
            
    def load_image_set_index(self):
        """
        find out which indexes correspond to given image set (train or val)
        :return:
        """
        image_set_index = range(0, len(self.image_files))
        return image_set_index


    def image_path_from_index(self, index):
        """
        given image index, find out full path
        :param index: index of a specific image
        :return: full path of this image
        """
        image_file = self.image_files[index]
        assert os.path.exists(image_file), 'Path does not exist: {}'.format(image_file)
        return image_file

    def gt_roidb(self):
        """
        return ground truth image regions database
        :return: imdb[image_index]['boxes', 'gt_classes', 'gt_overlaps', 'flipped']
        """
        cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
        if os.path.exists(cache_file):
            with open(cache_file, 'rb') as fid:
                roidb = cPickle.load(fid)
            logger.info('%s gt roidb loaded from %s' % (self.name, cache_file))
            return roidb

        gt_roidb = [self.load_pascal_annotation(index) for index in self.image_set_index]
        with open(cache_file, 'wb') as fid:
            cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL)
        logger.info('%s wrote gt roidb to %s' % (self.name, cache_file))

        return gt_roidb

    def load_pascal_annotation(self, image_index):
        image_path = self.image_files[image_index]
        name = os.path.splitext(os.path.basename(self.image_files[image_index]))[0]
        import xml.etree.ElementTree as ET
        roi_rec = dict()
        roi_rec['image'] = image_path
        im = Image.open(image_path)
        width, height = im.size
        roi_rec['height'] = height
        roi_rec['width'] = width

        tree = ET.parse(self.label_files[name])
        objs = tree.findall('object')
        
        num_objs = len(objs)

        boxes = np.zeros((num_objs, 4), dtype=np.uint16)
        gt_classes = np.zeros((num_objs), dtype=np.int32)
        overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)

        class_to_index = dict(zip(self.classes, range(self.num_classes)))
        # Load object bounding boxes into a data frame.
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text) - 1
            y1 = float(bbox.find('ymin').text) - 1
            x2 = float(bbox.find('xmax').text) - 1
            y2 = float(bbox.find('ymax').text) - 1
            cls = class_to_index[obj.find('name').text.lower().strip()]
            boxes[ix, :] = [x1, y1, x2, y2]
            gt_classes[ix] = cls
            overlaps[ix, cls] = 1.0

        roi_rec.update({'boxes': boxes,
                        'gt_classes': gt_classes,
                        'gt_overlaps': overlaps,
                        'max_classes': overlaps.argmax(axis=1),
                        'max_overlaps': overlaps.max(axis=1),
                        'flipped': False})
        return roi_rec

config.TRAIN.BATCH_IMAGES = 1
config.TRAIN.BATCH_ROIS = 128
config.TRAIN.END2END = True
config.TRAIN.BBOX_NORMALIZATION_PRECOMPUTED = True
ctx = [mx.gpu(int(i)) for i in range(8)]

network = default.network
default.pretrained = '/mnt/network_data/mxnet/models/vgg16'
import time
date = time.strftime("%Y-%m-%d")

if not os.path.exists(date):
    os.makedirs(date)
prefix = date + '/rcnn-' + network
print(prefix)
lr = 0.001
lr_step = '5'


sym = eval('get_' + network + '_train')(num_classes=config.NUM_CLASSES, num_anchors=config.NUM_ANCHORS)
feat_sym = sym.get_internals()['rpn_cls_score_output']


batch_size = len(ctx)
input_batch_size = config.TRAIN.BATCH_IMAGES * batch_size


logger.info(pprint.pformat(config))


image_sets = [mapillary(classes), mapillary(classes, 'validation')]

roidbs = [image_set.gt_roidb() for image_set in image_sets]
roidb = merge_roidb(roidbs)
roidb = filter_roidb(roidb)


train_data = AnchorLoader(feat_sym, roidb, batch_size=input_batch_size, shuffle=True,
                          ctx=ctx, work_load_list=None,
                          feat_stride=config.RPN_FEAT_STRIDE, anchor_scales=config.ANCHOR_SCALES,
                          anchor_ratios=config.ANCHOR_RATIOS, aspect_grouping=config.TRAIN.ASPECT_GROUPING)


max_data_shape = [('data', (input_batch_size, 3, max([v[0] for v in config.SCALES]), max([v[1] for v in config.SCALES])))]
max_data_shape, max_label_shape = train_data.infer_shape(max_data_shape)
max_data_shape.append(('gt_boxes', (input_batch_size, 100, 5)))
logger.info('providing maximum shape %s %s' % (max_data_shape, max_label_shape))


data_shape_dict = dict(train_data.provide_data + train_data.provide_label)
arg_shape, out_shape, aux_shape = sym.infer_shape(**data_shape_dict)
arg_shape_dict = dict(zip(sym.list_arguments(), arg_shape))
out_shape_dict = dict(zip(sym.list_outputs(), out_shape))
aux_shape_dict = dict(zip(sym.list_auxiliary_states(), aux_shape))
logger.info('output shape %s' % pprint.pformat(out_shape_dict))


begin_epoch = 0
end_epoch = default.e2e_epoch
arg_params, aux_params = load_param(default.pretrained, default.pretrained_epoch, convert=True)
arg_params['rpn_conv_3x3_weight'] = mx.random.normal(0, 0.01, shape=arg_shape_dict['rpn_conv_3x3_weight'])
arg_params['rpn_conv_3x3_bias'] = mx.nd.zeros(shape=arg_shape_dict['rpn_conv_3x3_bias'])
arg_params['rpn_cls_score_weight'] = mx.random.normal(0, 0.01, shape=arg_shape_dict['rpn_cls_score_weight'])
arg_params['rpn_cls_score_bias'] = mx.nd.zeros(shape=arg_shape_dict['rpn_cls_score_bias'])
arg_params['rpn_bbox_pred_weight'] = mx.random.normal(0, 0.01, shape=arg_shape_dict['rpn_bbox_pred_weight'])
arg_params['rpn_bbox_pred_bias'] = mx.nd.zeros(shape=arg_shape_dict['rpn_bbox_pred_bias'])
arg_params['cls_score_weight'] = mx.random.normal(0, 0.01, shape=arg_shape_dict['cls_score_weight'])
arg_params['cls_score_bias'] = mx.nd.zeros(shape=arg_shape_dict['cls_score_bias'])
arg_params['bbox_pred_weight'] = mx.random.normal(0, 0.001, shape=arg_shape_dict['bbox_pred_weight'])
arg_params['bbox_pred_bias'] = mx.nd.zeros(shape=arg_shape_dict['bbox_pred_bias'])


for k in sym.list_arguments():
    if k in data_shape_dict:
        continue
    assert k in arg_params, k + ' not initialized'
    assert arg_params[k].shape == arg_shape_dict[k], \
        'shape inconsistent for ' + k + ' inferred ' + str(arg_shape_dict[k]) + ' provided ' + str(arg_params[k].shape)
for k in sym.list_auxiliary_states():
    assert k in aux_params, k + ' not initialized'
    assert aux_params[k].shape == aux_shape_dict[k], \
        'shape inconsistent for ' + k + ' inferred ' + str(aux_shape_dict[k]) + ' provided ' + str(aux_params[k].shape)


fixed_param_prefix = config.FIXED_PARAMS
data_names = [k[0] for k in train_data.provide_data]
label_names = [k[0] for k in train_data.provide_label]
mod = MutableModule(sym, data_names=data_names, label_names=label_names,
                    logger=logger, context=ctx, work_load_list=None,
                    max_data_shapes=max_data_shape, max_label_shapes=max_label_shape,
                    fixed_param_prefix=fixed_param_prefix)


rpn_eval_metric = metric.RPNAccMetric()
rpn_cls_metric = metric.RPNLogLossMetric()
rpn_bbox_metric = metric.RPNL1LossMetric()
eval_metric = metric.RCNNAccMetric()
cls_metric = metric.RCNNLogLossMetric()
bbox_metric = metric.RCNNL1LossMetric()
eval_metrics = mx.metric.CompositeEvalMetric()
for child_metric in [rpn_eval_metric, rpn_cls_metric, rpn_bbox_metric, eval_metric, cls_metric, bbox_metric]:
    eval_metrics.add(child_metric)

batch_end_callback = callback.Speedometer(train_data.batch_size, frequent=default.frequent)
means = np.tile(np.array(config.TRAIN.BBOX_MEANS), config.NUM_CLASSES)
stds = np.tile(np.array(config.TRAIN.BBOX_STDS), config.NUM_CLASSES)
epoch_end_callback = callback.do_checkpoint(prefix, means, stds)

base_lr = lr
lr_factor = 0.1
lr_epoch = [int(epoch) for epoch in lr_step.split(',')]
lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch > begin_epoch]
lr = base_lr * (lr_factor ** (len(lr_epoch) - len(lr_epoch_diff)))
lr_iters = [int(epoch * len(roidb) / batch_size) for epoch in lr_epoch_diff]
logger.info('lr %f lr_epoch_diff %s lr_iters %s' % (lr, lr_epoch_diff, lr_iters))
lr_scheduler = mx.lr_scheduler.MultiFactorScheduler(lr_iters, lr_factor)
# optimizer
optimizer_params = {'momentum': 0.9,
                    'wd': 0.0005,
                    'learning_rate': lr,
                    'lr_scheduler': lr_scheduler,
                    'rescale_grad': (1.0 / batch_size),
                    'clip_gradient': 5}

# train
mod.fit(train_data, eval_metric=eval_metrics, epoch_end_callback=epoch_end_callback,
        batch_end_callback=batch_end_callback, kvstore=default.kvstore,
        optimizer='sgd', optimizer_params=optimizer_params,
        arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)


## What have you tried to solve it?

Rebuild from newest git pull.
Rebuild from 8713d257cde97a660a459aa8a50a780944cf823c (0.10 release)
Changed mshadow::cuda::kMaxThreadsPerBlock to 256.
 - Throws error on MapRedKeepLowestKernel because it's trying to launch a kernel with 1024 threads which errors in CheckLaunchParam.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pascal TitanX too many resources requested for launch #6775

Environment info

Error Message:

Minimum reproducible example

optimizer

train

What have you tried to solve it?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Pascal TitanX too many resources requested for launch #6775

Description

Environment info

Error Message:

Minimum reproducible example

optimizer

train

What have you tried to solve it?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions