-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[NNVM][TEST] Test against numerical grad #1505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kazum @kevinthesun @srkreddy1238 could you please review this PR? |
srkreddy1238
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgrechanik-h
That was a great start to standardize the helpers across various test cases 👍
Few comments to allow non numpy forward/backward callers.
Otherwise this PR look great & waiting to get this merged so that all other test cases can use this instead of local helpers.
| None (default) means that all input variables will be used. May be a | ||
| list of pairs `(var_name, shape)`. | ||
|
|
||
| np_forward : Callable[..., List[numpy.ndarray]], optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to give different naming for np_forward : Need not be numpy always, could be an implementation in any of the frontends we have.
Also receive and pass a flexible parameters 'attr' (dictionary object with default value None) for this function which can carry the implementation specific data like session info ...etc.
| logging.debug(debug_stage) | ||
|
|
||
| numpy_res = np_forward(**np_inputs_without_head_grads) | ||
| np.testing.assert_allclose(nnvm_res[0], numpy_res, atol=atol, rtol=rtol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of above comment on np_forward,
Pls check if the result from np_forward is a list and is yes, then compare all arrays in list.
|
|
||
| dump_graph : bool | ||
| Dump the graph even on success. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure an option to custom comparison function argument make any sense here.
Just raising as it came across my thought.
| only_targets : Set[str] | ||
| Test only for those targets from `ctx_list()` that are also in this set. | ||
|
|
||
| numerical_grads : bool or "if_possible" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about putting "if_possible" case inside numerical_grads=True?
| then check gradients numerically only if this graph can be created (i.e. if there are some | ||
| operations with unimplemented gradients, it will just issue a warning). | ||
|
|
||
| delta : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add optional to optional arguments.
| shape = shape.copy() if shape else {} | ||
|
|
||
| grad_input_vars_real = [] | ||
| for x in grad_input_vars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic looks complicated. How about just passing a list of symbol as grad_input_vars? Shape information can be defined in shape(maybe input_shapes of dict from str to tuple?) argument so that we don't have multiple shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not having shapes in grad_input_vars would be cleaner, I'll fix it.
| dtypes = graph.json_attr('dtype') | ||
|
|
||
| if shapes is None or dtypes is None: | ||
| graph = graph.apply('InferShape').apply('InferType') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we assume set_shape_inputs and set_dtype_inputs have been called. It might be better to directly pass input_shapes and input_types into this function to make it more general.
|
|
||
| forward_graph = nnvm.graph.create(symbol) | ||
|
|
||
| if dtype is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set_dtype_inputs. just check dtype is not None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I didn't understand this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dtype is None, graph_to_function will raise exception? Same thing will happen for shape.
How about setting dtype as optional but not None as default value, such as dtype="float32"? For shape argument we should force user to pass in value. Then the code logic can be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting dtype='float32' as default would simplify a lot of tests a little bit. However, if some variables have their dtypes specified by attributes then either these attributes should be preferred or an error should be raised, so I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Maybe we can put the shape/dtype check right in this function, before infer shape/dtype, and check either input variable shape/dtype attributes are set or shape/dtype arguments are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since shapes and dtypes of some inputs may be inferred from shapes and dtypes of other inputs, the proper solution would be to run inference passes, and only then check if some inputs are left without types. Then these inputs can be assigned the default type.
| # nnvm_res contains the output and gradients (if they are needed) | ||
| nnvm_res = main_function(**np_inputs) | ||
|
|
||
| if np_forward is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If np_forward is None, forward part comparison will be skipped? I think we should enforce np_forward to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we might want check gradients only (or even only numerical gradients). But I think it is good idea to require passing numerical_grads=True explicitly if no reference function is specified.
| # Since the result may be non-scalar, we have to put another operation on the top, | ||
| # so we just multiple by the randomly generated head_grads. | ||
| # This way we can reuse the gradient values which has been already computed. | ||
| def function(**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to a more meaningful name.
| ---------- | ||
| function | ||
| A function that takes inputs as keyword arguments (like `function(**input_values)`) and | ||
| returns a scalar result. Should accept and return numpy arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description is not quite clear here. Scalar or ndarray?
| Relative tolerance. | ||
| """ | ||
|
|
||
| if function_value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just compute this value instead of setting it as an argument?
| inputs = [('x', dshape, x)] | ||
| helper(y, inputs, dtype, forward, backward) | ||
| inputs = [(x, dshape)] | ||
| check_function(y, inputs, forward, backward, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are setting grad_input_vars=inputs, but inputs are actually input_shapes. It would be more clear if we can refactor the interface of check_function.
| def check_numerical_grads(function, grad_input_vars, input_values, grad_values, function_value=None, | ||
| delta=1e-3, max_error=1e+3, max_discarded_frac=0.1, | ||
| atol=1e-2, rtol=1e-2): | ||
| """A helper function that checks that numerical gradients of a function are equal to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is new and will affect all backward operators, can you give more explanation on it?
If developers add new backward ops but find the checking failure, how can they quickly debug the issue? Since this is not simply comparing with np computation result, maybe we can have a tutorial for it. This would also be good to the stability of CI while we are introducing more and more backward ops.
| if grad.shape != input_values[x_name].shape: | ||
| raise AssertionError( | ||
| "Gradient wrt '{}' has unexpected shape {}, expected {} " | ||
| .format(x.attr('name'), grad.shape, input_values[x_name].shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x is not defined here. x.attr('name') should be x_name.
|
|
||
|
|
||
| def check_numerical_grads(function, grad_input_vars, input_values, grad_values, function_value=None, | ||
| delta=1e-3, max_error=1e+3, max_discarded_frac=0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of delta is different from the check_function's one. Should be 1e-2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it should be 1e-3 everywhere, thank you!
| return (function(**modified_values) - function_value)/a_delta | ||
|
|
||
| for x_name in grad_input_vars: | ||
| grad = grad_values[x_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace these two lines with for x_name, grad in grad_values.items():. Then grad_input_vars is no longer necessary.
| raise ValueError("numerical_grads must be a bool or 'if_possible', not {}" | ||
| .format(numerical_grads)) | ||
|
|
||
| input_vars = [x for x in symbol.list_input_variables()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply replace this with input_vars = symbol.list_input_variables()?
Also added some tests for The main thing that is left to do is something like a tutorial on what to do if numerical gradient |
|
Simplified numerical grad checking, improved docs. Wasn't sure where to put the text on what to do when numgrad check fails, so I put it under Python API/NNVM API/nnvm.testing @kazum @kevinthesun @srkreddy1238 could you please review this again? |
|
|
||
| shape : Dict[nnvm.Symbol or str, Tuple[int]] or Tuple[int], optional | ||
| A dict mapping input variable names to shapes, or just a single shape. | ||
| By default shapes will be inferred automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we infer input shapes? I think this argument shouldn't be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are inferred from variables' attributes:
x = sym.Variable("x", shape=(1, 2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface doesn't have any constraint to input symbol. So I can write some codes like:
data = sym.Variable("data")
net = sym.conv2d(data, ...)
check_function(net, shape=None, ...)
It will raise exception later, since no shape information is provided? Currently this kind of input is marked as valid.
Also for kwargs, we can directly pass default values, instead of just None. Then we don't need to write a lot of "if xxx = None:" statements at the beginning stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the error message is not nice in this case. I think making it nicer and clarifying the documentation should be enough.
|
|
||
| forward_graph = nnvm.graph.create(symbol) | ||
|
|
||
| if dtype is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dtype is None, graph_to_function will raise exception? Same thing will happen for shape.
How about setting dtype as optional but not None as default value, such as dtype="float32"? For shape argument we should force user to pass in value. Then the code logic can be simpler.
| if shape is not None: | ||
| nnvm.compiler.graph_attr.set_shape_inputs(backward_graph, shape) | ||
|
|
||
| backward_graph = backward_graph.apply('InferShape').apply('InferType') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/dmlc/tvm/blob/master/nnvm/tests/python/compiler/test_top_level1.py#L32-L37
We just need to update the shape head_grads for input shape dictionary when compiling.
| Testing new operations | ||
| ---------------------- | ||
|
|
||
| When adding new operations, it it a good idea to test them. Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“it it” => “it is”
| all_shapes = graph.json_attr('shape') | ||
| all_dtypes = graph.json_attr('dtype') | ||
|
|
||
| all_dtypes = [None if t == -1 else TCODE_TO_DTYPE[t] for t in all_dtypes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TCODE_TO_DTYPE[-1] is defined as None in graph_attr.py. The if-else expression is not necessary.
| if None in dtype.values(): | ||
| raise ValueError("Input variables with no type: {}".format(dtype)) | ||
|
|
||
| if any([not s for s in shape.values()]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all(shape.values()) looks simpler.
| if (exclude_targets is not None and (target in exclude_targets or | ||
| str(target) in exclude_targets)) or\ | ||
| (only_targets is not None and not (target in only_targets or | ||
| str(target) in only_targets)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression is a bit complicated. I’d suggest splitting them:
if exclude_targets is not None:
if target in exclude_targets or str(target) in exclude_targets:
logging.info(...)
continue
if only_targets is not None:
if target not in only_targets and str(target) not in only_targets:
logging.info(...)
continue
|
@srkreddy1238 @kevinthesun @kazum could you help to review again according to the new changes and explicitly approve if it is good to you. |
|
Thanks everyone. This is now merged. |
* [NNVM][TEST] Numerical gradient testing * [NNVM][TEST] Make some tests a little faster * Fix the failing test_top_level3 * Target exclusion for the check_function * Try to ignore singularities * grad_input_vars now can't contain shapes * Don't pass unnecessary grad_input_vars to check_function * Multiple outputs; fixes; testing of check_function * Use numerical_grads_params to pass parameters to numgrad checker * Fail when no action is requested excplicitly * Pass additional params to functions * Silence the linter issue * Simplified numgrad checking * Improved docs for check_function * Fixed the error message when no dtype is provided * Several fixes * Tests with shape/dtype inference for inputs * Don't check dense's grads on cuda * Raise an error if output dtypes haven't been inferred * Moved shape/dtype inference into a separate function; use float32 as fallback * Remove redundant dtype=float32 * Fix multiple outputs * Use check_function in the rest of the test_top_level1
* [NNVM][TEST] Numerical gradient testing * [NNVM][TEST] Make some tests a little faster * Fix the failing test_top_level3 * Target exclusion for the check_function * Try to ignore singularities * grad_input_vars now can't contain shapes * Don't pass unnecessary grad_input_vars to check_function * Multiple outputs; fixes; testing of check_function * Use numerical_grads_params to pass parameters to numgrad checker * Fail when no action is requested excplicitly * Pass additional params to functions * Silence the linter issue * Simplified numgrad checking * Improved docs for check_function * Fixed the error message when no dtype is provided * Several fixes * Tests with shape/dtype inference for inputs * Don't check dense's grads on cuda * Raise an error if output dtypes haven't been inferred * Moved shape/dtype inference into a separate function; use float32 as fallback * Remove redundant dtype=float32 * Fix multiple outputs * Use check_function in the rest of the test_top_level1
This PR moves the helper function checking nnvm operations against numpy reference implementations from
test_top_level#tonnvm.testingand adds the ability to compare symbolic gradients to numerically computed gradients. Also this PR fixes a wrong gradient implementation for the division operation (it was found using this approach).Currently only tests that used the old helper function are rewritten in such a way that they now use the new function. There are still many tests which didn't use the helper function, they should also be rewritten so that they use this function, if possible, but this is left for future work.