Description
Model inference outputs not matching when model is saved on MXNet v1.0.0 and the same model is loaded on MXNet v1.3.0 (built from latest master branch).
The error was observed in the CI run for Model Backwards Compatibility Check and is available here : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-backwards-compatibility-checker/detail/restricted-backwards-compatibility-checker/1/pipeline/
Error Message:
Items are not equal:
File "model_backwards_compat_inference.py", line 135, in <module>
test_lenet_gluon_load_params_api()
File "model_backwards_compat_inference.py", line 72, in test_lenet_gluon_load_params_api
assert_almost_equal(old_inference_results.asnumpy(), output.asnumpy())
File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 1.969787 exceeds tolerance rtol=0.000010, atol=0.000000. Location of maximum error:(5, 0), a=0.003014, b=0.003014
a: array([[ 0.01743407, -0.30903903],
[ 0.08352755, -0.365019 ],
[ 0.06324662, -0.4323489 ],...
b: array([[ 0.01743408, -0.3090391 ],
[ 0.08352771, -0.365019 ],
[ 0.06324662, -0.4323488 ],...
Minimum reproducible example
Perform the training and save the model on MXNet 1.0.0.
Model Definition:
class Net(gluon.Block):
def __init__(self, **kwargs):
super(Net, self).__init__(**kwargs)
with self.name_scope():
# layers created in name_scope will inherit name space
# from parent layer.
self.conv1 = nn.Conv2D(20, kernel_size=(5, 5))
self.pool1 = nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2))
self.conv2 = nn.Conv2D(50, kernel_size=(5, 5))
self.pool2 = nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2))
self.fc1 = nn.Dense(500)
self.fc2 = nn.Dense(2)
def forward(self, x):
x = self.pool1(F.tanh(self.conv1(x)))
x = self.pool2(F.tanh(self.conv2(x)))
# 0 means copy over size from corresponding dimension.
# -1 means infer size from the rest of dimensions.
x = x.reshape((0, -1))
x = F.tanh(self.fc1(x))
x = F.tanh(self.fc2(x))
return x
Training code snippet :
def train_lenet_gluon_save_params_api():
model_name = 'lenet_gluon_save_params_api'
create_model_folder(model_name)
logging.info('Saving files for model %s' % model_name)
net = Net()
weights = mx.initializer.Xavier(magnitude=2.57)
net.initialize(weights, ctx=[mx.cpu(0)])
# Prepare data
test_data = mx.nd.array(np.random.uniform(-1, 1, size=(20, 1, 30, 30)))
output = net(test_data)
# print (y)
mx.nd.save(os.path.join(get_model_path(model_name), ''.join([model_name, '-data'])), {'data': test_data})
save_inference_results(output, model_name)
net.save_params(os.path.join(get_model_path(model_name), ''.join([model_name, '-params'])))
Model Inference to be performed on MXNet built from source from the latest master
Inference snippet :
def test_lenet_gluon_load_params_api():
from mxnet.test_utils import assert_almost_equal
model_name = 'lenet_gluon_save_params_api'
logging.info('Performing inference for model/API %s' % model_name)
data = mx.nd.load(''.join([model_name, '-data']))
test_data = data['data']
# Load the model and perform inference
loaded_model = Net()
loaded_model.load_params(model_name + '-params')
output = loaded_model(test_data)
old_inference_results = mx.nd.load(model_name + '-inference')['inference']
assert_almost_equal(old_inference_results.asnumpy(), output.asnumpy())
logging.info('=================================')
logging.info('Assertion passed for model : %s' % model_name)
Steps to reproduce
- Run the training on MXNet 1.0.0 ( Installed via pip install mxnet==1.0.0)
- Run the inference on MXNet 1.3.0 ( Built from source using latest master branch)
This regression is not seen on models trained on v1.1.0 and v1.2.0 with inference performed on v1.3.0
Description
Model inference outputs not matching when model is saved on MXNet v1.0.0 and the same model is loaded on MXNet v1.3.0 (built from latest master branch).
The error was observed in the CI run for Model Backwards Compatibility Check and is available here : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-backwards-compatibility-checker/detail/restricted-backwards-compatibility-checker/1/pipeline/
Error Message:
Items are not equal:
Minimum reproducible example
Perform the training and save the model on MXNet 1.0.0.
Model Definition:
Training code snippet :
Model Inference to be performed on MXNet built from source from the latest master
Inference snippet :
Steps to reproduce
This regression is not seen on models trained on v1.1.0 and v1.2.0 with inference performed on v1.3.0