Do not register Dropout(12) as training ONLY kernel.#3859
Conversation
|
Which model requires Dropout for inferencing? |
@pranavsharma Maybe @SherlockNoMad may know? I just know Dropout started as an inference operator and then came TrainableDropout and then finally came the merger called Dropout(12) |
Yeah, given that this was never registered so far for inferencing wondering which model requires it? |
@pranavsharma My guess is Dropout(12) inference registration was a miss since we were morphing TrainableDropout into Dropout(12) with some minor changes like training_mode but I’ll let @SherlockNoMad finish this answer. |
Dropout was there from opset10+, so it means it previously was registered for inferencing and just the updated version is/was not, right? |
@faxu Correct. |
+1. Just
Ok. As such dropout gets eliminated in inferencing (see dropout_elimination.cc). |
|
If it's always eliminated, then it's not needed for inferencing right? If that's the case, maybe it should not be registered to minimize binary size. @codemzs ? |
@pranavsharma Thanks and agreed Dropout is a no-op in inference mode but for the sake of completeness and since this was already registered as an inference op do you not feel Dropout(12) should also be registered as inference op? |
@faxu Its for completeness and we have always register Dropout ops as inference ops. Our ONNX tests for Dropout(12) are also failing. The binary size will increase a bit and that is what I want to check with @pranavsharma if he is ok with it? I’ll let @faxu and @pranavsharma make a call on this and won’t insist. Just let me know soon, thanks! CC: @SherlockNoMad |
* Do not register Dropout(12) as training ONLY kernel. * Move Dropout forward implementation in inference project. * fix inference build test failures. * remove fp16 test since its support is absent on CPU. * build break.
Currently Dropout(12) is registered as Training ONLY kernel but doing so breaks inference only scenarios where ORT is not compiled with training operators. We must register Dropout(12) as inference operator so that it is available to both inferencing and training builds, just like Slice op, where forward is registered as inferencing op but backward op is registered as training only op.