Sometimes faster `sum(f,x)` rule by mcabbott · Pull Request #529 · JuliaDiff/ChainRules.jl

mcabbott · 2021-09-12T04:00:53Z

This does two things to the sum(f,x) rule.

First, it is a bit more efficient in how many temporary arrays it creates. It closes over the array of y,back tuples instead of making a new array for just the pullbacks. And when broadcasting the pullbacks, it avoids making a tuple in cases where f doesn't have a gradient anyway.
Second, it ~~uses Add derivatives_given_input ChainRulesCore.jl#456~~ checks derivatives_given_output to see if the gradient can be computed from just the input. If so, it can avoid storing the pullbacks at all.

Best, good, and worst case times:

julia> @btime gradient(x -> sum(abs, x), $(rand(30,30)));
  5.683 μs (31 allocations: 64.38 KiB)  # before
  3.109 μs (19 allocations: 35.83 KiB)  # with less storage
  791.667 ns (1 allocation: 7.19 KiB)   # with deriv_from_output

julia> @btime gradient(x -> sum(sum(log, x, dims=1)), $(rand(30,30)));
  15.000 μs (42 allocations: 51.45 KiB)
  13.959 μs (45 allocations: 30.38 KiB)  # with less storage
  7.604 μs (17 allocations: 8.02 KiB)    # with deriv_from_output

julia> @btime gradient(x -> sum(log∘exp, x), $(rand(300,300)));
  930.421 ms (8460059 allocations: 206.68 MiB)  # before
  873.300 ms (7740033 allocations: 187.46 MiB)  # after

julia> @btime sum((log∘exp).(x)) setup=(x=$(rand(300,300)));
  1.342 ms (2 allocations: 703.20 KiB)  # dual number broadcasting

mcabbott · 2021-11-29T00:34:16Z

Zygote test failure is was real:

julia> gradient((x->(imag(sum(exp, x));)), [1, 2, 3])[1] ≈ real(im .* exp.(1:3))
ERROR: MethodError: no method matching conj(::Nothing)
...
  [8] materialize
    @ ./broadcast.jl:860 [inlined]
  [9] sum_pullback_easy
    @ ~/.julia/dev/ChainRules/src/rulesets/Base/mapreduce.jl:78 [inlined]
 [10] ZBack
    @ ~/.julia/packages/Zygote/bJn8I/src/compiler/chainrules.jl:204 [inlined]

oxinabox · 2021-11-30T22:39:45Z

src/rulesets/Base/base.jl

    return (x, identity_pullback)
 end

+derivatives_given_output(Ω, ::typeof(identity), x) = tuple(tuple(true))


I am not sure how I feel about overloading this directly.
Rather than leaving it for @scalar_rule to do.

How should I feel?

Yea I don't know. It's a bit weird to have ever more functions you have to know to define. But some functions don't use the macro.

For some functions, it might in fact be ideal to provide several methods of this, if you can equally well compute the derivative using input or output. Using input only is what this PR exploits, but using output only might be useful for fusing some broadcast things. Something like relu can be equally efficient either way. The macro can't really figure that out.

we will need to keep thinking about that, but lets do this, so we have an example.
We can always back it out later, as it is an implementation detail

oxinabox · 2021-11-30T22:42:32Z

src/rulesets/Base/mapreduce.jl

+        # Then we can compute the forward pass as usual, save nothing but `xs`:
+        y = sum(f, xs; dims=dims)
+        function sum_pullback_easy(dy)
+            dxs = unthunk(dy) .* conj.(only.(only.(derivatives_given_output.(nothing, f, xs))))


should we have something like _siso_derivatives_given_output(f, x) = only(only(nothing, f, x)
so this can be

Suggested change

dxs = unthunk(dy) .* conj.(only.(only.(derivatives_given_output.(nothing, f, xs))))

dxs = unthunk(dy) .* conj.(_siso_derivatives_given_output.(f, xs))))

Or only_derivative_given_output. But not sure it's worth the extra complication of one more function to track down & know about.

I made this a do-block broadcast like the other cases, now, as I think that's more readable.

src/rulesets/Base/mapreduce.jl

oxinabox · 2021-11-30T22:48:26Z

src/rulesets/Base/mapreduce.jl

+
+    function sum_pullback_f(dy)
+        # For arrays of arrays, we ought to protect the element against broadcasting:
+        dys = dims isa Colon ? Ref(unthunk(dy)) : unthunk(dy)


= performant to use 1-tuple rather than Ref

Suggested change

dys = dims isa Colon ? Ref(unthunk(dy)) : unthunk(dy)

dys = dims isa Colon ? (unthunk(dy),) : unthunk(dy)

Isn't Ref the standard thing, and what's used internally? I think this makes the intention a little clearer.

julia> Broadcast.broadcastable(:x) Base.RefValue{Symbol}(:x) julia> Broadcast.broadcastable(sin) Base.RefValue{typeof(sin)}(sin)

oxinabox · 2021-11-30T22:49:23Z

src/rulesets/Base/mapreduce.jl

+    fx_and_pullbacks = map(x -> rrule_via_ad(config, f, x), xs)
+    y = sum(first, fx_and_pullbacks; dims=dims)
+
+    function sum_pullback_f(dy)


can we bring back the unicode?
I feel it actually add worthwild clarity here

See what you think of the current level of unicode-ness.

oxinabox · 2021-11-30T22:56:00Z

src/rulesets/Base/mapreduce.jl

+            # Then at least `f` has no gradient. Note that broadcasting here
+            # gets the shape right with or without `dims` keyword.
+            dxs = broadcast(fx_and_pullbacks, dys) do (_, back), dy1
+                unthunk(last(back(dy1)))


I don't think the unthunk is required anymore, as didn't we fix project?

Suggested change

unthunk(last(back(dy1)))

last(back(dy1))

I think my test case here was sum(sum, ...) where you get an InplaceableThunk from back here.

And ProjectTo does not like those:

julia> proj = ProjectTo([1,2,3]); julia> ith = rrule(sum, [1,2,3])[2](1)[2]; ith isa InplaceableThunk true julia> proj(ith) ERROR: MethodError: no method matching (::ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}})(::InplaceableThunk{Thunk{ChainRules.var"#1409#1412"{Int64, Colon, Vector{Int64}, ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}}}}, ChainRules.var"#1408#1411"{Int64, Colon}})

Although I think at some point I wanted to make it un-wrap, insert itself into the @thunk part, and return that.

In which case you'd get an array of thunks back, not an array of arrays. Not sure what we think about that.

The first test which fails without this unthunk is:

julia> test_rrule(sum, sum, [[1,2], [3,4], [5,6]]; check_inferred=false) test_rrule: sum on typeof(sum),Vector{Vector{Int64}}: Test Failed at /Users/me/.julia/packages/ChainRulesTestUtils/XI7i2/src/testers.jl:307 Expression: ad_cotangent isa NoTangent Evaluated: Thunk{ComposedFunction{ProjectTo{AbstractArray, NamedTuple{(:element, :axes), ... julia> CFG = ChainRulesTestUtils.ADviaRuleConfig(); julia> rrule(CFG, sum, sum, [[1,2], [3,4], [5,6]])[2](1.0) (NoTangent(), NoTangent(), Thunk{ComposedFunction{ProjectTo{AbstractArray, NamedTuple{(:element, :axes)

This works on the tagged version, something unthunks:

julia> test_rrule(sum, sum, [[1,2], [3,4], [5,6]]; check_inferred=false) Test Summary: | Pass Total Time test_rrule: sum on typeof(sum),Vector{Vector{Int64}} | 7 7 0.8s Test.DefaultTestSet("test_rrule: sum on typeof(sum),Vector{Vector{Int64}}", Any[], 7, false, false, true, 1.645230607920931e9, 1.645230608709222e9) julia> CFG = ChainRulesTestUtils.ADviaRuleConfig(); julia> rrule(CFG, sum, sum, [[1,2], [3,4], [5,6]])[2](1.0) (NoTangent(), NoTangent(), [[1.0, 1.0], [1.0, 1.0], [1.0, 1.0]]) (@v1.8) pkg> st ChainRules Status `~/.julia/environments/v1.8/Project.toml` ⌃ [082447d4] ChainRules v1.26.0

oxinabox · 2021-11-30T22:57:29Z

src/rulesets/Base/mapreduce.jl

    ∇prod_dims!(dx, vald, x, dy, y)
    return dx
 end
+∇prod_dims(::Val{dims}, x, dy::AbstractZero, y=prod(x; dims=dims)) where {dims} = dy


Is this meant to be part of this PR?

It's here because I found out it was missing when writing a test for this PR. Could be done separately I guess.

oxinabox · 2021-11-30T22:57:34Z

src/rulesets/Base/mapreduce.jl

    ∇prod!(dx, x, dy, y)
    return dx
 end
+∇prod(x, dy::AbstractZero, y::Number=prod(x)) = dy


Is this meant to be part of this PR?

oxinabox · 2021-11-30T22:57:42Z

src/rulesets/Base/mapreduce.jl

     return dx
- end
+end
+∇cumprod_dim(vald::Val{dim}, x::AbstractArray, dy::AbstractZero, y=cumprod(x; dims=dim)) where {dim} = dy


Is this meant to be part of this PR?

oxinabox · 2021-11-30T22:57:48Z

src/rulesets/Base/mapreduce.jl

    ∇cumprod!(dx, x, dy, y)
    return dx
 end
+∇cumprod(x::AbstractVector, dy::AbstractZero, y=cumprod(x)) = dy


Is this meant to be part of this PR?

oxinabox · 2022-02-15T18:01:02Z

Is this waiting for me to respond to something?

mcabbott · 2022-02-15T18:22:18Z

No, it's waiting for me to circle back, sorry.

mcabbott · 2022-02-20T14:58:27Z

I think this is ready to go.

Would be nice to fix #85 on top of it, there are various ways but probably better to explore in another PR.

mzgubic · 2022-03-29T11:47:26Z

Sorry for responding late, I was away. Would it be better for @oxinabox to review this instead, since she has the context?

mcabbott · 2022-05-07T03:57:35Z

Bump?

I think this is fine, and faster. But if we wait long enough eventually it will rot.

mzgubic

Thanks for being persistent, it's a great addition and would be a shame if it got stale. Looks a nice improvement to me overall.

src/rulesets/Base/mapreduce.jl

oxinabox

Yep, ok a few last comments to address, then merge when happy.
Sorry about dropping the ball on this one.
I am really hoping i can find time to catch up on my gitlab backlog soon

src/ChainRules.jl

oxinabox · 2022-05-10T14:29:37Z

src/rulesets/Base/base.jl

    return (x, identity_pullback)
 end

+derivatives_given_output(Ω, ::typeof(identity), x) = tuple(tuple(true))


we will need to keep thinking about that, but lets do this, so we have an example.
We can always back it out later, as it is an implementation detail

src/rulesets/Base/base.jl

src/rulesets/Base/fastmath_able.jl

Co-authored-by: Frames Catherine White <oxinabox@ucc.asn.au>

mcabbott mentioned this pull request Sep 12, 2021

Add derivatives_given_input JuliaDiff/ChainRulesCore.jl#456

Closed

mcabbott force-pushed the sum_f_x branch from 57f32c9 to 3a708e5 Compare November 24, 2021 20:46

mcabbott marked this pull request as ready for review November 24, 2021 21:28

oxinabox reviewed Nov 30, 2021

View reviewed changes

mcabbott mentioned this pull request Dec 1, 2021

Add rule for mean(f, x; dims) #85

Open

mcabbott added 10 commits February 18, 2022 18:16

save less stuff in sum(f, xs) rule

b049095

version using derivatives_given_input

4592c27

rules

75dd97f

rm derivatives_given_input

ef96896

add and fix some tests

9263ed4

rm benchmarks

40a1bf8

rebase fixup

407624d

fix tests

7ce0f9c

fix a test

59d1f3b

tighter check

a33533b

mcabbott force-pushed the sum_f_x branch from 6799519 to a33533b Compare February 19, 2022 00:23

mcabbott added 3 commits February 18, 2022 19:24

tidy, more unicode

4ed0163

rm one unthunk

9ab271c

comment

ad5b946

mcabbott mentioned this pull request Feb 20, 2022

Contrastive Loss function added FluxML/Flux.jl#1884

Closed

simplify AbstractZero methods

52abf33

mcabbott requested a review from mzgubic March 21, 2022 15:50

mzgubic approved these changes May 10, 2022

View reviewed changes

src/rulesets/Base/mapreduce.jl Show resolved Hide resolved

oxinabox approved these changes May 10, 2022

View reviewed changes

Apply 4 suggestions

d6757aa

Co-authored-by: Frames Catherine White <oxinabox@ucc.asn.au>

mcabbott merged commit ffefa07 into JuliaDiff:main May 10, 2022

mcabbott deleted the sum_f_x branch May 10, 2022 16:00

mcabbott mentioned this pull request May 10, 2022

Rule for mean(f,x) #615

Merged

	dxs = unthunk(dy) .* conj.(only.(only.(derivatives_given_output.(nothing, f, xs))))
	dxs = unthunk(dy) .* conj.(_siso_derivatives_given_output.(f, xs))))

	dys = dims isa Colon ? Ref(unthunk(dy)) : unthunk(dy)
	dys = dims isa Colon ? (unthunk(dy),) : unthunk(dy)

Conversation

mcabbott commented Sep 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcabbott commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oxinabox commented Feb 15, 2022

Uh oh!

mcabbott commented Feb 15, 2022

Uh oh!

mcabbott commented Feb 20, 2022

Uh oh!

mzgubic commented Mar 29, 2022

Uh oh!

mcabbott commented May 7, 2022

Uh oh!

mzgubic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oxinabox left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mcabbott commented Sep 12, 2021 •

edited

Loading

mcabbott commented Nov 29, 2021 •

edited

Loading