Skip to content

Conversation

@stedolan
Copy link
Contributor

[RFC text copied below]


Syntactic function arity

OCaml has dedicated syntax for n-ary (multiple-parameter) functions, written:

fun a b c -> ...

for lambda expressions, or:

let f a b c = ...

for let-bound functions.

However, while the OCaml parsetree already contains n-ary function applications, function definitions are internally always unary. That is, the function definition fun P1 P2 P3 -> BODY is currently equivalent to:

fun P1 ->
fun P2 ->
fun P3 ->
BODY

The proposal in this RFC is to introduce n-ary function definitions to the parsetree, with semantics so that they evaluate their argument patterns only after all arguments have been received, rather than one at a time. This makes the above equivalent instead to:

fun x1 ->
fun x2 ->
fun x3 ->
let P1 = x1 in
let P2 = x2 in
let P3 = x3 in
BODY

The motivation for this change is in two parts, explained in detail below:

  • Semantics: in the relatively rare cases where it makes a difference, the proposed semantics is both clearer and closer to programmer expectations than the current semantics.

  • Implementation: the proposed semantics would allow us to simplify some tricky and fragile parts of the compiler.

The precise change to the parsetree is discussed in the final section, which requires a careful treatment of the function keyword (since the reasonably common pattern let f a b = function ... defines a function of arity 3, not 2).

Semantic changes

The proposed change is a change to the order of evaluation of function argument binding, which can make a difference only via side effects. Such side effects are relatively rare, but do occur in mutable patterns, optional argument defaults, and patterns that raise exceptions.

Mutable patterns

With the current semantics of n-ary function definitions, the following program prints 0 both times (code from issue #7789):

type data = { mutable n  : int; }
let f { n } t = Printf.printf "n=%d\n%!" n
let go udata g =
  g 1.0;
  udata.n <- 10;
  g 1.0
;;
let udata = { n = 0 } in
go udata (f udata);;

The read of the field n by f happens at the time of the creation of the partial application (f udata), rather than at the time of the eventual call g 1.0, and the value 0 is cached.

This behaviour was changed to match the specification in 4.06. The current behaviour is sufficiently confusing that it was repeatedly reported as a bug in 4.06 (#7675, #7789, #10032, #10641), and in 4.12 a new warning (68, match-on-mutable-state-prevent-uncurry) was introduced which fires if this behaviour ever occurs.

The semantics proposed here amount to returning to the pre-4.06 behaviour, but this time with a specification to match. Warning 68 would also become redundant and be removed.

Optional argument defaults

Optional arguments can have default values, which are evaluated if the optional argument is not specified, e.g.:

let f ?(x = default ()) y z = x + y + z

Due to an intentional loophole in the current specification, the exact point at which default () is evaluated is not specified. The
behaviour of the current implementation is to try to delay evaluation of defaults until the function body. However, this sometimes results in evaluation at an unexpected time (cf. issues 7531, 5975).

First, since OCaml 4.13, defaults may be evaluated earlier than the function body. This was changed after a number of soundness bugs were discovered in the delaying heuristic in prior versions, as defaults were being delayed past patterns that may depend on their results. In the current behaviour, the rules for exactly when a default is evaluated are subtle. For instance, evaluation order in the above example of default () changes if the variable pattern y above changes to the record pattern {y} (regardless of mutability).

Second, since functions in the Parsetree are always unary, it is not always clear where the function body is, and defaults may be evaluated later than expected. For instance, consider:

type distinct_counter = string -> int

let make_counter_1 ?(tbl = Hashtbl.create 10) () : distinct_counter =
  Hashtbl.(fun s -> replace tbl s (); length tbl)

let make_counter_2 ?(tbl = Hashtbl.create 10) () : distinct_counter =
  fun s -> Hashtbl.replace tbl s (); Hashtbl.length tbl

The function returned by make_counter_1 returns the number of distinct strings it has seen. However, the function returned by
make_counter_2 always returns 1. The difference is that the module open construct Hashtbl.(...) in the former is sufficient to block delaying of default evaluation, while in the latter the creation of the hashtable is delayed past fun s -> ....

The semantics proposed here are that default arguments will always be evaluated at the start of the function body, which in most cases matches the current behaviour. The issues that arose with reordering do not affect the proposed semantics, as all arguments are equally delayed.

Patterns that raise

Nonexhaustive patterns and lazy patterns can raise exceptions during evaluation (as can certain edge cases of unboxed floats, when interrupted by signal handlers). The proposed change to the semantics of n-ary functions will affect functions with such patterns. Specifically, the following example will no longer raise a match failure at definition time:

let f (Some x) y = x + y
let g = f None

Instead, the match failure will only be raised once g is called. This is an observable change in behaviour, but it's hard to imagine a program relying on it.

Implementation considerations

The current logic to deal with the interleaving of patterns and lambdas is quite subtle in parts, and these parts could be simplified
with the proposed semantics. In particular, the logic for delaying default arguments (described above) would be simplified, as would the logic for first-class module patterns (which for reasons to do with the structure of Lambda, already partially delay binding). Finally, the currying optimisations would become more reliable, as described below.

The counterpoint is that some new work would need to be done to plumb the arity of function expressions from parsetree through to Lambda, as these are currently erased by the parser and approximately recovered by Lambda (and are already preserved from Lambda onwards).

Currying detection

Function definitions in OCaml are curried by default, in principle accepting one argument at a time and returning a closure accepting the rest. Since most function applications are saturated (that is, they pass the same number of arguments as the function definition expects), the implementation of functions is optimised to make saturated applications fast.

However, this optimisation requires knowing how many arguments a function definition expects. Since this information is not currently tracked in the parse tree, it must be guessed later, and sometimes this guess is wrong. For example:

let hello ~person ~greeting ppf =
  Printf.fprintf ppf "%s, %s!\n" greeting person

type greeter = { greet : person:string -> unit } [@@unboxed]

let make_greeter ~greeting ppf =
  { greet = hello ~greeting ppf }

Here make_greeter is detected as a 3-argument function rather than a 2-argument one, because the partial application of hello results in construction of a Lambda.Lfunction which is picked up by the currying optimisation as though it were a third argument to make_greeter. Saturated two-argument applications of this function are therefore not optimised.

N-ary lambdas would make this issue go away, because the number of parameters of a function is known rather than inferred.

Parsetree changes

As well as the fun a b c -> ... syntax, OCaml additionally supports matching on the final argument of a function using the function keyword, allowing for instance the definition of a three-argument function as:

let f a b = function
  | None -> a + b
  | Some c -> a + b + c

Additionally, a return type annotation may appear in a function:

fun (a : int) (b : int) (c : int) : int -> a + b + c

If the function keyword is being used to match on the final argument, then the only place the return type annotation may legally appear is just before the final argument:

fun a b : int option -> int = function
  | None -> a + b
  | Some c -> a + b + c

Finally, newtypes may appear interspersed with function arguments, as in fun a (type t) b -> ...

So, the proposed representation for functions contains a sequence of arguments (either function parameters or newtypes), followed by an optional type annotation, followed by a body (either a single expression or a case list introduced by function). The type annotation types the body in either case.

The proposed addition to the parsetree is as follows:

type fun_body =
  | Fun_body of expression
  | Fun_cases of case list
and fun_argument =
  | Arg_simple of pattern
  | Arg_labelled of string * pattern
  | Arg_optional of string * pattern * expression option
  | Arg_newtype of string loc
and expression_desc =
  ...
  | Pexp_function of argument list * core_type option * function_body
  ...

@ncik-roberts
Copy link

I'm planning to start work on implementing the change described in this RFC. From a discussion with @stedolan, I understand that the maintainers gave some feedback that they generally agreed with this proposal. Nonetheless, I thought it would be wise to broadcast my plans here, in case the community or maintainers have additional feedback on the RFC.

@fpottier
Copy link

I have begun reading and wanted to make some comments, but have lacked time to finish reading and think. Stay tuned...

@gasche
Copy link
Member

gasche commented Mar 21, 2023

I understand that the maintainers gave some feedback that they generally agreed with this proposal.

I get an "over my dead body" vibe from some of the maintainers, so my general feeling is that more discussion may be needed unfortunately.

@lpw25
Copy link
Contributor

lpw25 commented Mar 22, 2023

Which maintainers? We discussed it at the last dev meeting and got approval in principle for it.

It’s hardly a major change, the only observable difference is in a corner case that we currently have a specific warning for because all OCaml developers think this is already how it works.

@gasche
Copy link
Member

gasche commented Mar 22, 2023

I don't remember reaching a consensus on this RFC at the last dev-meetings. I went back to read my notes but unfortunately there is essentially nothing about it (see below). Maybe I gave up at the end and did not take note of a building consensus, or maybe you heard what you wanted to hear but people didn't actually approve it in principle.

Syntactic function arity

#32

Gabriel: this proposal changes the way we interpret the current
parsetree, it changes the operational semantics, but not the type
system.

I would still write this down as "more discussion needed".

(As to my personal opinion: I'm sympathetic to the change (which is a somewhat weak form of support) and I find the general explanation of the changes and why they matter for performance convincing, but I disagree that this is a minor change. My understanding of the proposal is that we move the internal language from 1-ary abstractions (the standard lambda-calculus) to n-ary abstractions with special support for partial applications, without reflecting the arity in the types. This is easy to explain and specify (although weirdly enough the RFC does not try to specify this), but it is changing the very core of the language to move from a standard construct to a non-standard construct.)

@stedolan
Copy link
Contributor Author

I get an "over my dead body" vibe from some of the maintainers

@gasche Could you encourage those maintainers who feel so strongly that this is a bad idea to say so? So far your comment above is the only negative comment I've received about the proposal. I do recall proposing to implement this at the last meeting, and the only concern I heard was about the availability of reviewer time.

(I'm also a little confused by your technical comments: what do you mean by "special support for partial applications", and what do you mean by "the internal language"?)

@gasche
Copy link
Member

gasche commented Mar 22, 2023

I'm also a little confused by your technical comments: what do you mean by "special support for partial applications", and what do you mean by "the internal language"?

My understanding of the RFC is that currently the OCaml syntax and operational semantics are like this, a standard call-by-value lambda-calculus (with pattern-matching):

t ::= t t | fun p -> t | ...

(fun p -> t) v
~>
match v with p -> t

and the proposal is to migrate to the following structure (I am using a list meta-syntax here, I hope this is clear):

t ::= t t | fun [p1, .., pn] -> t | ...

(fun (p :: ps) -> t) v
~>
(fun ps -> match v with p -> t)

(fun [] -> t)
~>
t

The syntax fun [p1..pn] -> t is a n-ary abstraction (with patterns). The first reduction rule in the "new system" corresponds to what I call "special support for partial application": with this rule, the reduction of the match on the argument is delayed to until all other function arguments has been applied. If we rewrote to match v with p1 -> (fun ps -> t) instead we would have something equivalent to the current system, with no "special support for partial application".

I hope that this is clearer? If my model of the proposal is in fact correct, you might want to include it in the RFC itself.

Finally, this is a core calculus (what I called an "internal language"). The real OCaml AST also has function, and your proposal is to desugar in the following way:

fun p1... pn -> function h
:=
fun [p1, ..., pn, x] -> match x with h

fun p1 ... pn -> t   (* otherwise; t is not 'function h' *)
:=
fun [p1, ..., pn] -> t

instead of the current desugaring (on function expressions with at least two arguments) specified as

fun p1 p2 ... pn -> t
:=
fun p1 -> fun p2 ... pn -> t

@gasche
Copy link
Member

gasche commented Mar 22, 2023

Note: with the rules above, the matching order for argument patterns is in fact specified to go from right to left. I hadn't thought of this consequence when writing the rule, it is not unreasonable (but still possibly surprising; do we want to specify a left-to-right-declaration-side-order for default values of labelled arguments for exmaple?), but maybe we want to leave more freedom to the implementation to order the matches on the argument, or at least not force a given choice in the desugaring rule.

Doing this reasonably nicely suggests a slightly more complex internal syntax for partial applications:

t ::= ... | fun [p1 = v1, .., pn = vn] [q1..qn] -> t

(fun [p1 = v1, .., pn = vn] (q :: qs) -> t) v
~>
fun [p1 = v1, .., pn = vn, q = v] qs -> t

fun [p1 = v1, .., pn = vn] [] -> t
~>
match v1, .., vn with p1, .., pn -> t

@xavierleroy
Copy link

I get an "over my dead body" vibe from some of the maintainers

I did have such a reaction to the idea of reflecting function arities in their types, e.g. giving incompatible types to fun x y -> x + y and to fun x -> let f y = x + y in f. But this is not what this RFC is about (fortunately!).

From what I understand, it's about how fun pat1 pat2 pat3 -> e is desugared to unary lambda abstractions and to match...with pattern-matchings. And I'm open to changing the current rules if the new rules make more sense.

Concerning the operational semantics, I don't think we can (or should) give one at the level of the parsetree representation. To me, the operational semantics should be defined after desugaring of many syntactic forms (such as fun pat -> e or fun pat1 ... patN -> e) that we keep unexpanded in the parsetree because it's more convenient for error reporting, PPX preprocessing, syntax editing, etc, but not because they have a clear semantics.

@ncik-roberts
Copy link

Note: with the rules above, the matching order for argument patterns is in fact specified to go from right to left. I hadn't thought of this consequence when writing the rule, it is not unreasonable (but still possibly surprising; do we want to specify a left-to-right-declaration-side-order for default values of labelled arguments for exmaple?)

The introduction section of the RFC proposes a left-to-right matching order for argument patterns. And, this agrees with my understanding from talking with stedolan. (I'm only saying this to clarify a goal of the RFC. It doesn't change any need to give a clear operational semantics somewhere, nor does it imply that this is the only reasonable semantics.)

@stedolan
Copy link
Contributor Author

@gasche: thanks, that makes things clearer.

My understanding of the RFC is that currently the OCaml syntax and operational semantics are like this, a standard call-by-value lambda-calculus (with pattern-matching):

No, that is not the current OCaml syntax and semantics. OCaml has n-ary applications but unary lambda, and parameters and arguments may be labelled or optional, so the syntax is something more like:

    t ::= t [arg1, .., argn] | fun param -> t | ...
param ::= pat | ~x:pat | ?x:(pat = t)
  arg ::= t   | ~x:t   | ?x:t 

The n-ary applications are semantically important, as (f ...) ... is not in general equivalent to f .... For instance:

# let f ?x y ~x = print_endline "hello";;
val f : ?x:'a -> 'b -> x:'c -> unit = <fun>
# (f 1) ~x:2;;
hello
- : unit = ()
# f 1 ~x:2;;
- : x:'_weak1 -> unit = <fun>

The reduction relation of a lambda applied to a series of arguments is complex. A term (fun param -> body) [arg1, ..., argn] is reduced roughly as follows:

  1. If there is a term t in the arg list with the same label as param, then it is removed from the argument list and the application reduces to let pat = t in body [rest1, ..., restn-1].
  2. If the param is ?x:(pat=default) and there is no arg with label x, but there is an arg that is unlabelled, then the application reduces to C[let pat=default in body'] [arg1, ..., argn] where C is an unspecified context such that C[body'] = body. (roughly, C means that the evaluation of default is delayed past various other constructs)
  3. If there is no term in the arg list with the same label as param, then all of the arguments are evaluated to values and the application reduces to fun param -> (body [val1, ..., valn])

The proposal in the RFC is to introduce n-ary abstractions alongside the existing n-ary applications. In particular, this allows the ambiguity in reduction 2 (default arguments) to be removed, as it becomes clear exactly when eliminated default parameters are reduced (i.e. after all of the arguments).

maybe we want to leave more freedom to the implementation to order the matches on the argument

I'm generally against leaving such freedom to the implementation, as underspecified semantics tends to be a source of bugs in user programs.

@xavierleroy:

I get an "over my dead body" vibe from some of the maintainers

I did have such a reaction to the idea of reflecting function arities in their types, e.g. giving incompatible types to fun x y -> x + y and to fun x -> let f y = x + y in f. But this is not what this RFC is about (fortunately!).

Thanks for clarifying, that makes more sense! (Indeed, I would expect strong objections to such a backwards-incompatible RFC, but this isn't one)

@fpottier
Copy link

The n-ary applications are semantically important, as (f ...) ... is not in general equivalent to f ....

I believe that this argument is untenable. The manual explicitly states that function application is left associative, which means that (f x) y and f x y are equivalent. I would claim that this equivalence is one of the fundamental properties of the language (this is, after all, a so-called functional language) and that we want to preserve it. If this property is not currently true, then this is a problem that needs to be fixed.

The manual also explicitly states that fun p1 p2 -> e is equivalent to fun p1 -> fun p2 -> e. This is another basic property that we must preserve.

Generally speaking, function definition and function application are the most fundamental features of the language, and (disregarding labeled arguments and optional arguments) their current semantics is extremely simple. Easy to explain, easy to model (in a formal semantics). Trying to introduce a subtle notion of n-ary functions in a language whose syntax and type system have been designed according to the principle that every function is unary... does not sound like a good idea to me.

This RFC remarks that some dark corners of the language pose problems: namely, matching over mutable data and default expressions. These are good remarks. I would strongly argue in favor of cleaning up these dark corners and not changing the syntax or semantics of functions, which is as simple as it can be, and should remain so. Regarding matching over mutable data: just keep the status quo (warning 68), or forbid matching over mutable data altogether. (Remove the lazy pattern as well.) Pattern matching has a declarative flavor and should arguably be pure (that is, it should have no side effect at all, except perhaps Match_failure, which is a fatal error, so not really problematic). Regarding labeled arguments, optional arguments, default expressions, I have not thought deeply about them; I would suggest documenting how they are desugared (is this possible?).

@fpottier
Copy link

Incidentally, correct me if I am wrong, but I believe it is currently the case that pattern matching over mutable data can be used to violate memory safety. The pattern matching compiler assumes that the data is not modified while it is being matched, but in practice one can violate this assumption by running multiple threads. This is another argument in favor of removing patterns that read mutable data.

@stedolan
Copy link
Contributor Author

@fpottier:

Your argument, if I understand it correctly, is that if one disregards labeled arguments, optional arguments, matching over mutable or lazy data, default expressions and Match_failure, then the semantics of unary function abstraction and application is extremely simple (to explain and to model), and the proposal here would break that simplicity.

But if you disregard all of those features, then this RFC makes no difference. The semantics proposed here and the semantics of unary functions are not distinguishable without these features, so an argument for or against this RFC needs to consider programs that use them.

With labeled and optional arguments (in particular, commutation of labels and elimination of optionals), unary applications are already insufficient to describe the semantics. This is why the application form is currently n-ary.

With default expressions, the unary semantics is remarkably inefficient, which is why (as far as I'm aware) no released version of OCaml has ever used it. Instead, the semantics of default expressions is intentionally ambiguous in the manual, and the actual implementation tries to adhere to what this RFC proposes (but does not always manage to, because of the lack of n-ary abstractions in the current syntax)

With mutable or lazy data, the unary semantics is so confusing that when 4.06 changed function compilation to more closely conform to this semantics, the new behaviour was repeatedly reported as a bug (and eventually, warning 68 was added to trigger whenever it occurs).

So while the unary semantics is indeed simpler when these features are unused, that simplicity is not affected by this RFC. When these features are used, the unary semantics is inadequate.


The manual explicitly states that function application is left associative

Well spotted. Currently it states in the "Function application" subsection of 7.2 that:

The expression expr argument1 … argumentn evaluates the expression expr and those appearing in argument1 to argumentn. [...]

and proceeds to give an explicitly n-ary semantics for applications, yet also lists "function application" as a left-associative operator in the table of precedences in 7.1, which is inconsistent with the above. We should fix the table.

@fpottier
Copy link

this RFC makes no difference

Hi Stephen! I understand your point. However, I do not completely agree with it. I would like to be able to teach (and write down in my formal semantics of OCaml) that functions are unary and that (f x) y and f x y are the same thing. With this RFC, that would be true in a subset of the language, but not true in an absolute sense. The equivalence (f x) y = f x y would be a white lie. Someone who designs a formal semantics of OCaml would have to choose between 1- sticking with simple unary functions and not supporting labeled/optional arguments 2- supporting labeled/optional arguments at the cost of complicating the syntax and semantics of functions.

We should fix the table.

I would argue that we should strive to maintain the equivalence law (f x) y = f x y and fix the aspects of the language that invalidate this law.

I would also argue that, due to syntactic reasons, the notion of n-ary function application is brittle. When one writes f x y, is this a binary application, or two nested unary applications? Same question about (f x) y, about A.(f x) y, about (f x : int -> int) y, etc.

The text that describes n-ary function applications in the manual (Section 7.2) seems fundamentally ambiguous to me because it describes a function application e a_1 ... a_n but omits the side condition that this application must be maximal, that is, the expression e must not itself be an application, and the expression e a_1 ... a_n must not itself be nested within a larger application. I believe that this could lead to misunderstandings. Also, the text seems to guarantee that the function produced by e is invoked after all of the arguments a_1 ... a_n have been evaluated, but it is not clear to me whether this guarantee is intentional: in a scenario where the arity k of the function is less than n, do we really intend to forbid evaluating the application e a_1 ... a_k before evaluating the remaining arguments a_{k+1} ... a_n?

If a world where every function is unary, these problems disappear.

With default expressions, the unary semantics is remarkably inefficient

Does it have to be? What about requiring default expressions to be values? Their semantics would then be easier to understand, and they would be easier to optimize.

@jberdine
Copy link

For reference, I initially implemented ocamlformat assuming the equivalence (f x) y = f x y. In practice it caused some code to be untypable after reformatting. On the issue I opened at the time, @garrigue explained the interaction between n-ary application and the typing rules for labeled and optional arguments. So ocamlformat now preserves the syntactic arity of applications. But it is surprising to users and reported as a bug: ocaml-ppx/ocamlformat#1799.

It seems to me that this RFC would improve the current situation, and not get in the way later if someone works out a (backward incompatible) redesign of (at least) labeled and optional arguments.

What about requiring default expressions to be values? Their semantics would then be easier to understand, and they would be easier to optimize.

In my experience, such a limitation would eliminate a large fraction of the benefit and uses of default expressions.

@gasche
Copy link
Member

gasche commented Mar 23, 2023

I would be interested in seeing operational semantics described, in a core language, for OCaml today and for the RFC. I think that it would be useful to handle first the fragment without labelled and optional arguments, and then add labelled and optional arguments.

Regarding @fpottier's idea of forbidding evaluation in optional arguments and in patterns:

  • We do currently have a soundness bug in pattern-matching related to mutable state, Fix MPR7241: Pattern matching with mutable and lazy patterns is unsound ocaml#717 . It's on my TODO-list somewhere, but I would be perfectly happy to let others adopt it.
  • Reading mutable state is a particular weak form of side-effect because it is erasable and duplicable in pure code. It is only when interacting with mutable writes that its operational semantics become complex. For me having the feature is perfectly reasonable as long as it is easy for both the implementation and the users to predict when those writes may take place.
  • Notice in particular that there is no pattern to read an atomic variable, so we don't have to consider non-cooperative concurrent updates in full generality.
  • I would like to have view patterns or active patterns someday, which assume the ability to run code during pattern-matching execution. Forbidding it in principle would feel like we are punishing people writing pure code.

Two final notes, more theoretical:

  1. in the OCaml match .. with construct, there is no specification ambiguity about the execution of impure computations within patterns (typically lazy patterns): the observed operational semantics should be equivalent to evaluating the value of the scrutinee sequentially against each clause, from top to bottom, stopping on the first match. The problem discussed in this RFC is about the interaction of patterns and function abstractions, so the interleaving of potential side-effects in patterns and delaying/suspending computations. Function abstractions can be thought of as a co-pattern, and impure languages that would allow nesting of patterns and co-pattern would also have this effect-ordering problem on match .. with.
  2. In languages with a polarized evaluation order or at least with a call-by-need strategy for terms of function type (the other constructors may be handled differently), match v with p -> fun ps -> t and fun ps -> match v with p -> t are equivalent (assuming no variable clash), they both delay evaluation until the function is fully applied and its result is forced. Having n-ary function abstractions and special support for partial application (I believe that F# does this explicitly for FFI purposes, but I just failed to find it again in the docs) is a way to approach this behavior without a type-directed evaluation order.

@lthls
Copy link

lthls commented Mar 23, 2023

I would like to be able to teach (and write down in my formal semantics of OCaml) that functions are unary and that (f x) y and f x y are the same thing.

I don't think this RFC would prevent you from doing that. As I see it, the language already has n-ary functions, and this RFC simply makes it less opaque for the users by specifying exactly when this transformation occurs. And this transformation has a visible impact because of a number of language features such as optional arguments and effectful patterns.

Currently the reconstruction of n-ary functions is hidden away in some part of the compiler; what this RFC proposes is to move this decision earlier, propagate it, and clean up the old code for reconstruction. Then, switching to a strict unary function semantics should become as simple as ensuring the parser always generates unary functions. That could even be turned on with either a command-line flag or a configuration option.

So if we plan to keep discussing the issues around n-ary functions and applications, I believe that this RFC is a step in the right direction anyway, while still being mostly compatible with the current semantics.

@fpottier
Copy link

It may be true that this RFC does not make the situation worse than it already is, from my point of view.

One thing that I do not clearly understand is whether the RFC proposes only internal changes in the parse tree (and changes in the semantics of existing code) or also proposes to introduce new syntax.

E.g., the RFC mentions this code, which as far as I can tell is not accepted at present:

fun a b : int option -> int = function
  | None -> a + b
  | Some c -> a + b + c

But there does not seem to be a clear explanation of what new concrete syntax is added.

@lthls
Copy link

lthls commented Mar 23, 2023

I suspect that @stedolan meant to write one of these:

(* with fun *)
fun a b : (int option -> int) -> function
  | None -> a + b
  | Some c -> a + b + c

(* with let *)
let f a b : int option -> int = function
  | None -> a + b
  | Some c -> a + b + c

I don't think there was any plan to change the concrete syntax.

@stedolan
Copy link
Contributor Author

@fpottier:

I would like to be able to teach (and write down in my formal semantics of OCaml) that functions are unary and that (f x) y and f x y are the same thing. With this RFC, that would be true in a subset of the language, but not true in an absolute sense. The equivalence (f x) y = f x y would be a white lie.

Right, it would be nice to have a simple description for teaching and modelling purposes. However, this model doesn't accurately describe OCaml since the release of OCaml 3.0 in 2000. (This RFC doesn't affect this situation, because it proposes changes to abstraction and not application). Having said that, white lies that are true only of a fragment of the language seem fine for teaching and modelling, as the language in its entirety is rarely taught or modelled.

This RFC does, by design, break the equivalence between fun PAT1 -> fun PAT2 -> BODY and fun PAT1 PAT2 -> BODY. However, it only does so in the case where PAT1 contains some unusual features. In particular, fun x -> fun y -> BODY remains equivalent to fun x y -> BODY. (The situation here is slightly better than the current situation for applications, as this equivalence is unconditional, while the equivalence between (f x) y and f x y has a side-condition on the type of f).

I would also argue that, due to syntactic reasons, the notion of n-ary function application is brittle. When one writes f x y, is this a binary application, or two nested unary applications?

I think this impression arises only from viewing function application as an associative operator. In OCaml, f x y is a binary application, syntactically distinct from (f x) y and f (x y) in the same way that the type a * b * c is distinct from (a * b) * c and a * (b * c).

the RFC mentions this code, which as far as I can tell is not accepted at present

@lthls is correct: no new syntax is being proposed here, and in writing the RFC I mixed up the let-form and the fun-form of function definitions, writing = where I should have written ->

@gasche:

I think the canonical reference for the semantics of OCaml functions with labels remains @garrigue's thesis, or the shorter paper Labeled and optional arguments for Objective Caml. Note that that paper defines an n-ary application form and a unary abstraction form where the function parameter binders are always simple variable patterns. In terms of that semantics, the change proposed here is about how n-ary functions with complex binders are desugared: currently, the matching is interleaved with the arguments (causing the various problems described above), and the proposal is to do the matching after the -> instead.

@fpottier
Copy link

I think this impression arises only from viewing function application as an associative operator.

Indeed, OCaml is supposed to be based on lambda-calculus, where application is left associative. I think that it is a bad idea to deviate from this. As you noted, this is already the case in OCaml today, so you may be right in saying that this RFC does not make things worse in this regard.

I am afraid that n-ary abstractions may be brittle, in that their recognition might be disturbed by type annotations or scope annotations that occur in the middle of them. If I write let f p1 = (fun p2 -> ... : int -> int), is this recognized as a binary function? If I write let f p1 = A.(fun p2 -> ...), is it recognized as a binary function? I think we should preserve the fact that type annotations and scope annotations do not (accidentally) influence the semantics of programs.

Regarding n-ary applications, I believe that they are also syntactically brittle, by which I mean that I am afraid that some constructs (such as (f x : int -> int) y or A.(f x) y) may not be recognized as binary applications, even though type annotations and scope annotations in principle should not influence the meaning of a program. That said, if I have understood correctly, this RFC actually does not introduce a notion of n-ary application, so I guess this is OK.

@stedolan
Copy link
Contributor Author

Indeed, OCaml is supposed to be based on lambda-calculus, where application is left associative. I think that it is a bad idea to deviate from this. As you noted, this is already the case in OCaml today, so you may be right in saying that this RFC does not make things worse in this regard.

In defence of OCaml's behaviour, I'd say that the lambda-calculus considers only functions of the form fun x -> ..., binding a single unlabeled parameter with a simple variable pattern. If you stick to those, the semantics does indeed match the (CBV) lambda calculus, both before and after this RFC. In OCaml today, function application stops being left associative when you introduce functions with labeled and default parameters, but the pure lambda calculus doesn't really have anything to say about those. This RFC concerns the desugaring of fun P1 P2 -> E into functions that bind only simple variable patterns (in particular, disagreeing with the current behaviour when P1 is side-effecting), but again this seems outside of what lambda-calculus normally considers.

If I write let f p1 = (fun p2 -> ... : int -> int), is this recognized as a binary function?

No. This is not because of the presence of the type annotation, but because you split the function in two. The semantics under this RFC is the same as if you had written:

let f p1 = fun p2 -> ...

If p1 is a pattern with side-effects, then this is (under this RFC) distinct from the following:

let f p1 p2 = ...

Similarly, your second example let f p1 = A.(fun p2 -> ...) has the same semantics as:

let f p1 = fun p2 -> ...

which is different from the case where you had not split the function, if p1 has side-effects.

Type and scope annotations do not have any effect here. The distinction between unary and n-ary abstractions is based on concrete syntax: did you use the n-ary function syntax let f p1 p2 = ... / fun p1 p2 -> ... or not?

@gasche
Copy link
Member

gasche commented Mar 27, 2023

I agree with what you have written so far and I remain sympathetic to this issue, but note that when you say:

This RFC concerns the desugaring of fun P1 P2 -> E into functions that bind only simple variable patterns (in particular, disagreeing with the current behaviour when P1 is side-effecting), but again this seems outside of what lambda-calculus normally considers.

This is not wrong, but I would note that there is an obvious way to extend the lambda-calculus with patterns in function-binding position, which is to desugar fun P1 -> t into fun x -> match x with P1 -> t. This desugaring combines in an obvious way with the pure desugaring of fun P1 P2 -> t into fun P1 -> fun P2 -> t, and this gives a natural interpretation of fun P1 P2 -> t for the lambda-calculus. (This interpretation in fact coincides with the one using a n-ary abstraction when using call-by-need or non-standard polarized evaluation orders, but not in the standard call-by-value.)

My updated summary of the RFC is as follows:

  • the current semantics of labelled arguments in OCaml requires n-ary abstractions (but 1-ary applications suffice)
  • the current semantics of effectful patterns in OCaml use the reference 1-ary abstraction semantics
  • in practice users work with the n-ary abstraction model in mind anyway, and this gives better performance, so we should move to the n-ary abstraction model for effectful patterns as well
  • in fact let's be explicit about n-ari-ness of abstractions in the parsetree directly
  • the current implementation of optional arguments is overly complex because we lack this arity (complex heuristics are used instead), so we can simplify it quite a bit now

@stedolan
Copy link
Contributor Author

Your summary sounds right to me. A couple of small tweaks:

  • the current semantics of labelled arguments in OCaml requires n-ary abstractions (but 1-ary applications suffice)

I think you got "abstractions" and "applications" swapped here.

  • the current semantics of effectful patterns in OCaml use the reference 1-ary abstraction semantics

Yes, for most forms of effectful pattern. For default arguments, the current semantics is not specified precisely, and approximates the n-ary abstraction semantics. (It is only an approximation because the information needed to do it precisely is currently discarded)

@fpottier
Copy link

Type and scope annotations do not have any effect here.

That's a good property!

Overall, the RFC sounds reasonable to me. The features that I dislike preexist and it does not seem to create more harm.

@garrigue
Copy link

Sorry not to have participated in the discussion yet.
This has been a busy year for me (thankfully I'll be much freer starting next week).

Overall, I agree with Stephen's understanding of the semantics of labelled and optional arguments.

I will just add a bit of historical background.
The semantics of the label-selective lambda-calculus was based on commutating reductions, but there is no way to implement that efficiently.
So, from the beginning the approach of the olabl/ocaml implementation has been to compile this to normal functions.
In particular this means that while can change the order of arguments in applications, the order of abstractions is fixed, and reordered function types are not seen as equivalent. This is also why there was no need for n-ary abstraction.
While I did write down a formal semantics for labelled and optional arguments, default arguments were just seen as syntactic sugar expanding fun ?(x = default) -> body to fun ?x -> let x = match x with None -> default | Some x -> x in body.

Originally default arguments were just compiled that way, so that one could say that we had an eager evaluation semantics, both for partial applications and default arguments.
However, this caused a code blow up in labltk, where some functions have more than a dozen default arguments, for which the compiler made always sure that they were evaluated as soon as possible. I don't remember the detail of the cause, but the solution was to make the order of evaluation of default arguments unspecified, i.e. they are only guaranteed to be evaluated (any number of times) before the actual body of the function is evaluated (where the definition of body is actually ambiguous, as explained in the RFC). In practice this seemed sufficient.

Now the goal of this RFC seems to be to define precisely when evaluation of default arguments (and access to mutable patterns) occurs. If this is indeed the goal, I agree that introducing an explicit notion of "evaluation arity" is needed, i.e. no evaluation at all will occur if a function gets less arguments that its internal arity. A concern is whether we really want to specify this. One might want to use other compilation strategies in the future. For instance, one could imagine a situation where we would keep information about how to compute defaults in the cmx, to make it more efficient. Also, do we expect the javascript backend to conform to this specification?

This said, another interesting point of the RFC is about using this arity information to make uncurrying in the backend more predictable. If this is the case, then I think this makes more sense. After all, this problem comes from optimizations that have the native code compiler in mind. So what we are talking about is not so much the semantics of the language, which is already relatively well defined modulo some explicit leeway, but what we can assume the native code compiler will do with the code.

So, while I agree that this proposal makes sense, it might be better to see it as a specification of the compiler rather than a specification of the language. (Where somebody will probably tell me that the compiler is the only specification of the language anyway...)

@nojb
Copy link

nojb commented Mar 28, 2023

Also, do we expect the javascript backend to conform to this specification?

I think this should be automatic, since JSOO takes bytecode as input. (cc @hhugo)

@Lupus
Copy link

Lupus commented Mar 28, 2023

I think this should be automatic, since JSOO takes bytecode as input

There's also melange, so cc @anmonteiro as well

@gasche
Copy link
Member

gasche commented Mar 29, 2023

The general sense I get from the discussion in this RFC so far is that now there is a general consensus that the proposal is reasonable and we could move forward.

More precisely, but slightly sarcastically (no offense intended), the opinions given were diverse and mostly fell in the following camps:

  • Simplifying the whole thing as you propose sounds nice.
  • It's not worse than before.
  • Some part of the system (the fragment without labels) becomes more complex but the whole language was already complex before, and the overall complexity and surprise factor arguably decreases.

With this variety of somewhat sympathetic opinions, I think it would be reasonable to move forward with an implementation.

@gasche
Copy link
Member

gasche commented Jul 15, 2023

The implementation ocaml/ocaml#12236 has now been merged.

A non-trivial point that came up in the implementation but was missing here is the effect of this change on GADT typing. It was previously possible to write a function of the form fun K x -> ... whose type reads like type a . a t -> a, where a t is a GADT and K a GADT constructor that introduces an equality between a and an array type. This code is not compatible with the idea of delaying the matching on all parameter patterns to after all parameters have been consumed. This example is now rejected and must thus now written fun K -> (fun x -> ...) insteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.