Added casts to GDN, DeepFactorized, and ContinuousBase for better mixed precision compatibility#105
Added casts to GDN, DeepFactorized, and ContinuousBase for better mixed precision compatibility#105MahmoudAshraf97 wants to merge 3 commits intotensorflow:masterfrom MahmoudAshraf97:master
Conversation
changed tf.convert_to_tensor to tf.cast for better mixed precision support
added cast to floatx for logits for better mixed precision support
added cast to floatx for quantization inputs for better mixed precision support
|
Hi Mahmoud, thanks for the PR, but I'm not sure this is the best way to deal with mixed precision. Can you explain which parts of the model you want to have in which precision? All the layer and entropy model classes have a |
this argument works perfectly in single precision mode whether it's float16 or float32, but with mixed precision different types are used for forward pass and back propagation and calculation and storage of the weights which is handled automatically by tensorflow, so hard coding the dtype into the layer or object using dtype argument will not work because the layer must be flexible to work with multiple dtypes in the same model at different training stages. Also I used a version of TFC2.2 with this commit for a while and training a model using fp32 and mixed precision converged to the same result so I think it's safe to assume that it yields correct results |
|
I see, thanks for the explanation. We weren't aware of the Keras developments regarding mixed precision. I'll look into this some more, will keep you posted. |
|
We added full mixed precision support in commit c20abdb. Can you please check if this solves your issue? |
|
@jonycgn unfortunately, the mentioned commit does not resolve the issue, using the code https://github.com/MahmoudAshraf97/AutoencoderCompression with TF2.8 and TFC2.8 still throws the error Although the documentation of the function states that it accepts tensor objects, it throws the same error if a tensor is passed with a different dtype than the requested dype, check this code snippet for a quick example:
passing any dtype value other than tf.float32 gives an error Nevertheless, the commit handled the mixed precision training in a good way but the problem here is with tf.convert_to_tensor strange behavior, my proposal to solve this issue is to handle tf tensors using tf.cast and use tf.convert_to_tensor for other objects |
|
Could you clarify which specific instances of |
|
FWIW, I can't see your code in https://github.com/MahmoudAshraf97/AutoencoderCompression. Is it private? |
|
|
|
If you hit this problem in line 178, this would indicate that the analysis transform did not output a tensor that has the same dtype as |
|
tldr: In my code I didn't explicitly set the dtype for any layer or tensor, so they're all handled by keras and work perfectly in single precision whether float16 or float32. As I stated in an earlier comment, this is a colab notebook that presents a minimal example to try with
if mixed precision is enabled without setting |
|
Thanks for your explanation. However, I don't see why the input to the entropy model should be a variable. Certainly that could be the case for some models, but typically, it will not be a variable, but instead, it would be the output of some encoder-side neural network. These outputs should have Edit: Also, in the code that you pointed to in your repository, |
|
I might have used the wrong term when I said that the input will be a variable, the actual case is that keras uses tf.float32 for everything other than computations in mixed precision, this includes all tensors as per here. So naturally the input to the entropy model or any layer is variable_dtype which defaults to tf.float32 as for my code, I don't think the problem occurs with my code only as my experiments with the example notebook suggest otherwise but anyway this is a full traceback from the error in my code: |
|
I'm not sure this is the right way to interpret mixed precision training. If what you are saying is correct, this would mean that every layer would need to do its computations in We have a unit test here (and analogous ones for the other classes), that ensures that if the input tensor is |
|
Hey, we added support for mixed precision to the models under If the current code doesn't work for you, could you try to find out what is different in your codebase and let me know? I'm reluctant to switch all calls from |
|
Closing this due to inactivity. Please reopen in case you experience further issues with mixed precision training. |
Hello
These changes to support mixed precision training because explicitly instantiating the layers and objects in float16 throws errors with operations involving other layers' variables and weights