Improvements to numeric exponent rules#224
Conversation
willtebbutt
left a comment
There was a problem hiding this comment.
LGTM. Just needs version bump.
|
Thanks! This function's pretty important, so I'll hold off on merging for a few days for others to review. |
|
It's also not clear to me whether this should be considered a breaking change. |
|
Hmm yeah, it depends on whether or not we call this a bug. I want to call it a bug because the previous behaviour is a bit odd and is inconsistent with what we're generally aiming for. However, it might cause some people's code to change behaviour. |
|
Yeah, the main (only?) place a user could see a change would be if they passed the exponent as an int to an entry point of AD, e.g.: julia> Zygote.gradient((x, p) -> x^real(p), -10.0, 2) # real's pullback drops the imaginary parton master:(-20.0, 230.25850929940458)this pr(-20.0, 0.0)But realistically, the gradient on master cannot be used for anything like gradient descent, because if the 2 is perturbed by a non-integer value, then the gradient will raise a |
|
Sounds like old behavour was a bug. |
|
I think so too. If there are no objections, I'll merge on Tuesday (after bumping version number). |
This PR makes a few improvements to the rules for
^(::Number, ::Number).For the general rule, it slightly improves the efficiency by removing the second
^call.It also adds a new real rule that avoids unnecessarily complexifying the tangents and cotangents. The general rule embeds the numbers in the complex plane for complex differentiation. However, for real negative base, exponentiation is undefined (literally throws an error) unless the exponent is exactly an integer. So for negative base, the derivative wrt the exponent is actually undefined (hence we can't even call FD on it). The new rule adopts the subgradient convention when the base is negative.
Oh, and since the rules are defined in
fastmath.jl, I moved the test to the corresponding test file.Here are are a few examples:
Example 1:
frulewith positive real base, real exponentOnly the type has changed. Instead of getting a purely real complex tangent, we just get the equivalent real tangent. The
rrulehas the same behavior.on master:
this pr:
Example 2:
frulewith negative real base, integer exponentWe get the same tangent as we would have gotten had the input tangent on
pbeenZero()on master:
this pr:
Example 3:
rrulewith negative real base, integer exponentThe cotangent on
pis 0.on master:
this pr: