@@ -72,6 +72,7 @@ Backticks were added manually.
7272 * [ Upcasting to non-` @differentiable ` functions] ( #upcasting-to-non-differentiable-functions )
7373 * [ Implied generic constraints] ( #implied-generic-constraints )
7474 * [ Non-differentiable parameters] ( #non-differentiable-parameters )
75+ * [ Higher-order functions and currying] ( #higher-order-functions-and-currying )
7576 * [ Differential operators] ( #differential-operators )
7677 * [ Differential-producing differential operators] ( #differential-producing-differential-operators )
7778 * [ Pullback-producing differential operators] ( #pullback-producing-differential-operators )
@@ -88,7 +89,6 @@ Backticks were added manually.
8889 * [ Convolutional neural networks (CNN)] ( #convolutional-neural-networks-cnn )
8990 * [ Recurrent neural networks (RNN)] ( #recurrent-neural-networks-rnn )
9091* [ Future directions] ( #future-directions )
91- * [ Differentiation of higher-order functions] ( #differentiation-of-higher-order-functions )
9292 * [ Higher-order differentiation] ( #higher-order-differentiation )
9393 * [ Naming conventions for numerical computing] ( #naming-conventions-for-numerical-computing )
9494* [ Source compatibility] ( #source-compatibility )
@@ -2002,6 +2002,42 @@ _ = f0 as @differentiable (@noDerivative Float, Float) -> Float
20022002_ = f0 as @differentiable (@noDerivative Float , @noDerivative Float ) -> Float
20032003```
20042004
2005+ #### Higher-order functions and currying
2006+
2007+ As defined above, the ` @differentiable ` function type attributes requires all
2008+ non-` @noDerivative ` arguments and results to conform to the ` @differentiable `
2009+ attribute. However, there is one exception: when the type of an argument or
2010+ result is a function type, e.g. `@differentiable (T) -> @differentiable (U) ->
2011+ V`. This is because we need to differentiate higher-order funtions.
2012+
2013+ Mathematically, the differentiability of ` @differentiable (T, U) -> V ` is
2014+ similar to that of ` @differentiable (T) -> @differentiable (U) -> V ` in that
2015+ differentiating either one will provide derivatives with respect to parameters
2016+ ` T ` and ` U ` . Here are some examples of first-order function types and their
2017+ corresponding curried function types:
2018+
2019+ | First-order function type | Curried function type |
2020+ | @differentiable (T, U) -> V | @differentiable (T) -> @differentiable (U) -> V |
2021+ | @differentiable (T, @noDerivative U) -> V | @differentiable (T) -> (U) -> V |
2022+ | @differentiable (@noDerivative T, U) -> V | (T) -> @differentiable (U) -> V |
2023+
2024+ A curried differentiable function can be formed like any curried
2025+ non-differentiable function in Swift.
2026+
2027+ ``` swift
2028+ func curry <T , U , V >(
2029+ _ f : @differentiable (T, U) -> V
2030+ ) -> @differentiable (T) -> @differentiable (U) -> V {
2031+ { x in { y in f (x, y) } }
2032+ }
2033+ ```
2034+
2035+ The way this works is that the compiler internally assigns a tangent bundle to a
2036+ closure that captures variables. This tangent bundle is existentially typed,
2037+ because closure contexts are type-erased in Swift. The theory behind the typing
2038+ rules has been published as [ The Differentiable
2039+ Curry] ( https://www.semanticscholar.org/paper/The-Differentiable-Curry-Plotkin-Brain/187078bfb159c78cc8c78c3bbe81a9176b3a6e02 ) .
2040+
20052041### Differential operators
20062042
20072043The core differentiation APIs are the differential operators. Differential
@@ -2456,30 +2492,6 @@ typealias LSTM<Scalar: TensorFlowFloatingPoint> = RNN<LSTMCell<Scalar>>
24562492
24572493## Future directions
24582494
2459- ### Differentiation of higher- order functions
2460-
2461- Mathematically, the differentiability of `@differentiable (T, U) -> V` is
2462- similar to that of `@differentiable (T) -> @differentiable (U) -> V` in that
2463- differentiating either one will provide derivatives with respect to parameters
2464- `T` and `U`.
2465-
2466- To form a `@differentiable (T) -> @differentiable (U) -> V`, the most natural
2467- thing to do is currying, which one might implement as :
2468-
2469- ```swift
2470- func curry< T, U, V> (
2471- _ f: @differentiable (T, U) -> V
2472- ) -> @differentiable (T) -> @differentiable (U) -> V {
2473- { x in { y in f (x, y) } }
2474- }
2475- ```
2476-
2477- However, the compiler does not support currying today due to known
2478- type- theoretical constraints and implementation complexity regarding
2479- differentiating a closure with respect to the values it captures. Fortunately,
2480- we have a formally proven solution in the works, but we would like to defer this
2481- to a future proposal since it is purely additive to the existing semantics.
2482-
24832495### Higher- order differentiation
24842496
24852497Distinct from differentiation of higher- order functions, higher- order
0 commit comments