Fixes gradient, Jacobian, Hessian, and vjp tests#2
Fixes gradient, Jacobian, Hessian, and vjp tests#2mohdibntarek merged 15 commits intoJuliaDiff:mt/interfacefrom
Conversation
Co-authored-by: Mohamed Tarek <mohamed82008@gmail.com>
Pass Total
AbstractDifferentiation.jl | 117 117The issue within the jvp tests was that v = (rand(length(xvec)), rand(length(yvec)))was interpreted as one directional derivative, i.e., x = to_vec(xvec, yvec)
ẋ = to_vec(v)
fdm(ε -> f(x .+ ε .* ẋ), zero(eltype(x)))The fix now just augments v to |
| return y * derivative(d.backend, d.f, d.xs...) | ||
| end | ||
|
|
||
| function Base.:*(d::LazyDerivative, y::Union{Number,Tuple}) |
There was a problem hiding this comment.
I don't think we should support multiplication if the derivative returns a tuple. This part of the code can be simplified.
There was a problem hiding this comment.
Don't we want to support functions like fder(x, y) = exp(y) * x + y * log(x) for the LazyOperators as well?
| Base.:*(y, d::LazyGradient) = y * gradient(d.backend, d.f, d.xs...) | ||
|
|
||
| function Base.:*(d::LazyGradient, y::Union{Number,Tuple}) | ||
| if d.xs isa Tuple |
There was a problem hiding this comment.
Why is xs sometimes a tuple and sometimes not?
There was a problem hiding this comment.
I wanted to support:
AD.LazyJacobian(fdm_backend, x->fjac(x, yvec), xvec)
where xs is then only a vector xvec and not (xvec, ).
src/AbstractDifferentiation.jl
Outdated
| end | ||
| end | ||
|
|
||
| function Base.:*(d::LazyJacobian, ys::AbstractArray) |
There was a problem hiding this comment.
I don't think this does the right thing if ys is a matrix.
There was a problem hiding this comment.
Should we throw an error in that case or fix that manually by putting a vec? I now combined these AbstractArray specializations with the general fallback. They were just differing by stuff like (y,).
src/AbstractDifferentiation.jl
Outdated
| end | ||
| end | ||
|
|
||
| function Base.:*(d::LazyHessian, ys::AbstractArray) |
There was a problem hiding this comment.
how is this function different from the general fallback?
src/AbstractDifferentiation.jl
Outdated
| end | ||
| end | ||
|
|
||
| function Base.:*(ys::AbstractArray, d::LazyHessian) |
There was a problem hiding this comment.
can we combine this with the general fallback?
There was a problem hiding this comment.
good idea! See my comment above as well -- When I started I thought it makes sense to dispatch them all separately because of these tiny differences between Arrays, Tuples, and Numbers as input but it really leads to much more code and gets less readable..
| pff(cols) | ||
| end | ||
| end | ||
| elseif eltype(identity_like) <: AbstractMatrix |
There was a problem hiding this comment.
Could you please comment this part? It's not clear to me what's happening here.
There was a problem hiding this comment.
This was the fix I added for the computation of the Hessian. identity_like and cols look as follows in that case:
identity_like = ([1.0 0.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0 0.0; 0.0 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 0.0 1.0],)
identity_like[1] = [1.0 0.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0 0.0; 0.0 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 0.0 1.0]
cols = [1.0, 0.0, 0.0, 0.0, 0.0]
cols = [0.0, 1.0, 0.0, 0.0, 0.0]
cols = [0.0, 0.0, 1.0, 0.0, 0.0]
cols = [0.0, 0.0, 0.0, 1.0, 0.0]
cols = [0.0, 0.0, 0.0, 0.0, 1.0]so it was mainly to fix the input/output for the additional pushforward that is used. I'll need to check in a bit more detail if one can simplify that function a bit more. IIRC without that function I got dimension errors in the jvp of FiniteDifferences.jl because it would have pushed forward matrices like identity_like[1] instead of the columns.
There was a problem hiding this comment.
but I think you are right that this needs some additional care.. I think the current version will break down once we'd like to compute a tuple of Hessians.
| v = (rand(length(xvec)), rand(length(yvec))) | ||
| pf1 = AD.pushforward_function(fdm_backend, fjac, xvec, yvec)(v) | ||
|
|
||
| if fdm_backend isa FDMBackend2 # augmented version of v |
There was a problem hiding this comment.
Could you explain this part?
There was a problem hiding this comment.
This is related to the comment above: #2 (comment) .
The resulting error traces back to the following: When we use: v = (rand(length(xvec)), rand(length(yvec))) with our FDMBackend2 as the vector that we'd like to pushforward through a function fvec(x,y) that takes two inputs x and y, we get out an output that corresponds to the directional derivative with "v[1] as the direction" along x and "v[2] as the direction" along y. The other backends however interprete the v as change x along the direction v[1] while keeping y fix and change y in the direction v[2] while keeping x fix.
|
Looks good overall. Just left a few comments because I couldn't immediately understand what's happening in a couple of places. |
Current test status:
There are basically two fixes:
xto compute the gradients, so the value was computed wrongly.AbstractMatrixit didn't loop over its columns but vectorized the entire matrix which lead to a dimension error.It remains to fix the jvp case.
For example, in
fdm_backend2results in a single output vector instead of a tuple of two output vectors.