Document what to do at nondifferentiable points#419
Conversation
Codecov Report
@@ Coverage Diff @@
## main #419 +/- ##
==========================================
+ Coverage 92.91% 93.03% +0.11%
==========================================
Files 15 15
Lines 819 833 +14
==========================================
+ Hits 761 775 +14
Misses 58 58
Continue to review full report at Codecov.
|
|
I have dropped almost all the sub/super stuff and just stuck to lots of practical examples with discussion. |
Co-authored-by: Mason Protter <mason.protter@icloud.com>
Co-authored-by: Miha Zgubic <mzgubic@users.noreply.github.com>
74e52b0 to
35f8633
Compare
|
|
||
| This has a number of advantages. | ||
| - It follows the rule that derivatives are zero at local minima (and maxima). | ||
| - If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee. |
There was a problem hiding this comment.
| - If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee. | |
| - If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee. |
The word "flee" is evocative, but maybe a little confusing here. Maybe instead we could say "oscillate" or "wobble"
| ``` | ||
|
|
||
| We do not have to worry about what to return for the side where it is not defined. | ||
| As we will never be asked for the derivative at e.g. `x=-2.5` since the primal function errors. |
There was a problem hiding this comment.
Just a comment, not sure if it's important but the primal won't error if we make the argument complex. And in that case there's the interesting issue of the branch cut.
docs/src/nondiff_points.md
Outdated
| - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side. | ||
| - When derivative from each side is not equal, strongly consider reporting the average | ||
|
|
||
| Our goal as always, is to get a pragmatically useful result for everyone, which must by necessity also avoid a pathological result for anyone. No newline at end of file |
There was a problem hiding this comment.
Maybe worth mentioning that we can't always get the result that's best for literally everyone, but we sometimes just have to do our best.
mzgubic
left a comment
There was a problem hiding this comment.
Content looks good, and is definitely a useful addition.
My only suggestions would be to separate this in two: have the short version paragraph at the top in "writing good rules" and link to the rest of the text which IMO belongs to "maths" section.
|
the writing good rules section is too long. But I will move this under math. |
awf
left a comment
There was a problem hiding this comment.
Sorry annoying to add comments after you've already merged. I'm happy to do a PR instead if that's easier.
|
|
||
| This has a number of advantages. | ||
| - It follows the rule that derivatives are zero at local minima (and maxima). | ||
| - If you leave a gradient decent optimizer running it will eventually actually converge absolutely to the point -- where as with it being 1 or -1 it would never outright converge it would always flee. |
|
|
||
| The other option for `x->ceil(x)` would be relax the problem into `x->x`, and thus say it is 1 everywhere | ||
| But that it too weird, if the use wanted a relaxation of the problem then they would provide one. | ||
| We can not be imposing that relaxation on to `ceil` for everyone is not reasonable. |
There was a problem hiding this comment.
We can not be imposing that relaxation on to ceil for everyone
or
Imposing that relaxation on to ceil for everyone is not reasonable.
|
|
||
| We do not have to worry about what to return for the side where it is not defined. | ||
| As we will never be asked for the derivative at e.g. `x=-2.5` since the primal function errors. | ||
| But we do need to worry about at the boundary -- if that boundary point doesn't error. |
There was a problem hiding this comment.
Maybe replace with
But we do need to worry about at the boundary. The function is defined for x=0 (because exp is defined at -Inf), but AD will return <what will it return? Is it NaN?>
| As we will never be asked for the derivative at e.g. `x=-2.5` since the primal function errors. | ||
| But we do need to worry about at the boundary -- if that boundary point doesn't error. | ||
|
|
||
| Since we will never be asked about the left-hand side (as the primal errors), we can use just the right-hand side derivative. |
| But this is more or less the same as choosing some large value -- in this case an extremely large value that will rapidly overflow. | ||
|
|
||
|
|
||
| ### Derivative on-finite and different on both sides |
| ``` | ||
|
|
||
| In this example, the primal is defined and finite, so we would like a derivative to defined. | ||
| We are back in the case of a local minimal like we were for `abs`. |
| plot(x-> sign(x) * cbrt(x)) | ||
| ``` | ||
|
|
||
| In this example, the primal is defined and finite, so we would like a derivative to defined. |
| From the case studies a few general rules can be seen for how to choose a value that is _useful_. | ||
| These rough rules are: | ||
| - Say the derivative is 0 at local optima | ||
| - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side. |
| These rough rules are: | ||
| - Say the derivative is 0 at local optima | ||
| - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side. | ||
| - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side. |
| - Say the derivative is 0 at local optima | ||
| - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side. | ||
| - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side. | ||
| - When derivative from each side is not equal, strongly consider reporting the average |
|
@awf your comments look good to me, please do make a PR. |
Closes #404
This is based on a conversation @awf and I had at RSE-Conf 2018.
And we have to some extent been following it in ChainRules.jl since some time before then.
So here it is written down more formally
Here is the Docs Preview
Feedback is appreciated.