Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Model inference outputs not matching across MXNet versions using save_params and load_params #11961

@piyushghai

Description

@piyushghai

Description

Model inference outputs not matching when model is saved on MXNet v1.0.0 and the same model is loaded on MXNet v1.3.0 (built from latest master branch).
The error was observed in the CI run for Model Backwards Compatibility Check and is available here : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-backwards-compatibility-checker/detail/restricted-backwards-compatibility-checker/1/pipeline/

Error Message:

Items are not equal:

  File "model_backwards_compat_inference.py", line 135, in <module>
    test_lenet_gluon_load_params_api()
  File "model_backwards_compat_inference.py", line 72, in test_lenet_gluon_load_params_api
    assert_almost_equal(old_inference_results.asnumpy(), output.asnumpy())
  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
Error 1.969787 exceeds tolerance rtol=0.000010, atol=0.000000.  Location of maximum error:(5, 0), a=0.003014, b=0.003014
 a: array([[ 0.01743407, -0.30903903],
       [ 0.08352755, -0.365019  ],
       [ 0.06324662, -0.4323489 ],...
 b: array([[ 0.01743408, -0.3090391 ],
       [ 0.08352771, -0.365019  ],
       [ 0.06324662, -0.4323488 ],...

Minimum reproducible example

Perform the training and save the model on MXNet 1.0.0.
Model Definition:

class Net(gluon.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        with self.name_scope():
            # layers created in name_scope will inherit name space
            # from parent layer.
            self.conv1 = nn.Conv2D(20, kernel_size=(5, 5))
            self.pool1 = nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2))
            self.conv2 = nn.Conv2D(50, kernel_size=(5, 5))
            self.pool2 = nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2))
            self.fc1 = nn.Dense(500)
            self.fc2 = nn.Dense(2)

    def forward(self, x):
        x = self.pool1(F.tanh(self.conv1(x)))
        x = self.pool2(F.tanh(self.conv2(x)))
        # 0 means copy over size from corresponding dimension.
        # -1 means infer size from the rest of dimensions.
        x = x.reshape((0, -1))
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        return x

Training code snippet :

def train_lenet_gluon_save_params_api():
    model_name = 'lenet_gluon_save_params_api'
    create_model_folder(model_name)
    logging.info('Saving files for model %s' % model_name)
    net = Net()
    weights = mx.initializer.Xavier(magnitude=2.57)
    net.initialize(weights, ctx=[mx.cpu(0)])
    # Prepare data

    test_data = mx.nd.array(np.random.uniform(-1, 1, size=(20, 1, 30, 30)))
    output = net(test_data)
    # print (y)

    mx.nd.save(os.path.join(get_model_path(model_name), ''.join([model_name, '-data'])), {'data': test_data})
    save_inference_results(output, model_name)
    net.save_params(os.path.join(get_model_path(model_name), ''.join([model_name, '-params'])))

Model Inference to be performed on MXNet built from source from the latest master
Inference snippet :

def test_lenet_gluon_load_params_api():
    from mxnet.test_utils import assert_almost_equal
    model_name = 'lenet_gluon_save_params_api'
    logging.info('Performing inference for model/API %s' % model_name)

    data = mx.nd.load(''.join([model_name, '-data']))
    test_data = data['data']
    # Load the model and perform inference
    loaded_model = Net()
    loaded_model.load_params(model_name + '-params')
    output = loaded_model(test_data)
    old_inference_results = mx.nd.load(model_name + '-inference')['inference']
    assert_almost_equal(old_inference_results.asnumpy(), output.asnumpy())
    logging.info('=================================')
    logging.info('Assertion passed for model : %s' % model_name)

Steps to reproduce

  1. Run the training on MXNet 1.0.0 ( Installed via pip install mxnet==1.0.0)
  2. Run the inference on MXNet 1.3.0 ( Built from source using latest master branch)

This regression is not seen on models trained on v1.1.0 and v1.2.0 with inference performed on v1.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions