-
Notifications
You must be signed in to change notification settings - Fork 40
Syntactic function arity #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm planning to start work on implementing the change described in this RFC. From a discussion with @stedolan, I understand that the maintainers gave some feedback that they generally agreed with this proposal. Nonetheless, I thought it would be wise to broadcast my plans here, in case the community or maintainers have additional feedback on the RFC. |
|
I have begun reading and wanted to make some comments, but have lacked time to finish reading and think. Stay tuned... |
I get an "over my dead body" vibe from some of the maintainers, so my general feeling is that more discussion may be needed unfortunately. |
|
Which maintainers? We discussed it at the last dev meeting and got approval in principle for it. It’s hardly a major change, the only observable difference is in a corner case that we currently have a specific warning for because all OCaml developers think this is already how it works. |
|
I don't remember reaching a consensus on this RFC at the last dev-meetings. I went back to read my notes but unfortunately there is essentially nothing about it (see below). Maybe I gave up at the end and did not take note of a building consensus, or maybe you heard what you wanted to hear but people didn't actually approve it in principle.
I would still write this down as "more discussion needed". (As to my personal opinion: I'm sympathetic to the change (which is a somewhat weak form of support) and I find the general explanation of the changes and why they matter for performance convincing, but I disagree that this is a minor change. My understanding of the proposal is that we move the internal language from 1-ary abstractions (the standard lambda-calculus) to n-ary abstractions with special support for partial applications, without reflecting the arity in the types. This is easy to explain and specify (although weirdly enough the RFC does not try to specify this), but it is changing the very core of the language to move from a standard construct to a non-standard construct.) |
@gasche Could you encourage those maintainers who feel so strongly that this is a bad idea to say so? So far your comment above is the only negative comment I've received about the proposal. I do recall proposing to implement this at the last meeting, and the only concern I heard was about the availability of reviewer time. (I'm also a little confused by your technical comments: what do you mean by "special support for partial applications", and what do you mean by "the internal language"?) |
My understanding of the RFC is that currently the OCaml syntax and operational semantics are like this, a standard call-by-value lambda-calculus (with pattern-matching): and the proposal is to migrate to the following structure (I am using a list meta-syntax here, I hope this is clear): The syntax I hope that this is clearer? If my model of the proposal is in fact correct, you might want to include it in the RFC itself. Finally, this is a core calculus (what I called an "internal language"). The real OCaml AST also has instead of the current desugaring (on function expressions with at least two arguments) specified as |
|
Note: with the rules above, the matching order for argument patterns is in fact specified to go from right to left. I hadn't thought of this consequence when writing the rule, it is not unreasonable (but still possibly surprising; do we want to specify a left-to-right-declaration-side-order for default values of labelled arguments for exmaple?), but maybe we want to leave more freedom to the implementation to order the matches on the argument, or at least not force a given choice in the desugaring rule. Doing this reasonably nicely suggests a slightly more complex internal syntax for partial applications: |
I did have such a reaction to the idea of reflecting function arities in their types, e.g. giving incompatible types to From what I understand, it's about how Concerning the operational semantics, I don't think we can (or should) give one at the level of the parsetree representation. To me, the operational semantics should be defined after desugaring of many syntactic forms (such as |
The introduction section of the RFC proposes a left-to-right matching order for argument patterns. And, this agrees with my understanding from talking with stedolan. (I'm only saying this to clarify a goal of the RFC. It doesn't change any need to give a clear operational semantics somewhere, nor does it imply that this is the only reasonable semantics.) |
|
@gasche: thanks, that makes things clearer.
No, that is not the current OCaml syntax and semantics. OCaml has n-ary applications but unary lambda, and parameters and arguments may be labelled or optional, so the syntax is something more like: The n-ary applications are semantically important, as The reduction relation of a lambda applied to a series of arguments is complex. A term
The proposal in the RFC is to introduce n-ary abstractions alongside the existing n-ary applications. In particular, this allows the ambiguity in reduction 2 (default arguments) to be removed, as it becomes clear exactly when eliminated default parameters are reduced (i.e. after all of the arguments).
I'm generally against leaving such freedom to the implementation, as underspecified semantics tends to be a source of bugs in user programs.
Thanks for clarifying, that makes more sense! (Indeed, I would expect strong objections to such a backwards-incompatible RFC, but this isn't one) |
I believe that this argument is untenable. The manual explicitly states that function application is left associative, which means that The manual also explicitly states that Generally speaking, function definition and function application are the most fundamental features of the language, and (disregarding labeled arguments and optional arguments) their current semantics is extremely simple. Easy to explain, easy to model (in a formal semantics). Trying to introduce a subtle notion of n-ary functions in a language whose syntax and type system have been designed according to the principle that every function is unary... does not sound like a good idea to me. This RFC remarks that some dark corners of the language pose problems: namely, matching over mutable data and default expressions. These are good remarks. I would strongly argue in favor of cleaning up these dark corners and not changing the syntax or semantics of functions, which is as simple as it can be, and should remain so. Regarding matching over mutable data: just keep the status quo (warning 68), or forbid matching over mutable data altogether. (Remove the |
|
Incidentally, correct me if I am wrong, but I believe it is currently the case that pattern matching over mutable data can be used to violate memory safety. The pattern matching compiler assumes that the data is not modified while it is being matched, but in practice one can violate this assumption by running multiple threads. This is another argument in favor of removing patterns that read mutable data. |
|
Your argument, if I understand it correctly, is that if one disregards labeled arguments, optional arguments, matching over mutable or lazy data, default expressions and But if you disregard all of those features, then this RFC makes no difference. The semantics proposed here and the semantics of unary functions are not distinguishable without these features, so an argument for or against this RFC needs to consider programs that use them. With labeled and optional arguments (in particular, commutation of labels and elimination of optionals), unary applications are already insufficient to describe the semantics. This is why the application form is currently n-ary. With default expressions, the unary semantics is remarkably inefficient, which is why (as far as I'm aware) no released version of OCaml has ever used it. Instead, the semantics of default expressions is intentionally ambiguous in the manual, and the actual implementation tries to adhere to what this RFC proposes (but does not always manage to, because of the lack of n-ary abstractions in the current syntax) With mutable or lazy data, the unary semantics is so confusing that when 4.06 changed function compilation to more closely conform to this semantics, the new behaviour was repeatedly reported as a bug (and eventually, warning 68 was added to trigger whenever it occurs). So while the unary semantics is indeed simpler when these features are unused, that simplicity is not affected by this RFC. When these features are used, the unary semantics is inadequate.
Well spotted. Currently it states in the "Function application" subsection of 7.2 that:
and proceeds to give an explicitly n-ary semantics for applications, yet also lists "function application" as a left-associative operator in the table of precedences in 7.1, which is inconsistent with the above. We should fix the table. |
Hi Stephen! I understand your point. However, I do not completely agree with it. I would like to be able to teach (and write down in my formal semantics of OCaml) that functions are unary and that
I would argue that we should strive to maintain the equivalence law I would also argue that, due to syntactic reasons, the notion of n-ary function application is brittle. When one writes The text that describes n-ary function applications in the manual (Section 7.2) seems fundamentally ambiguous to me because it describes a function application If a world where every function is unary, these problems disappear.
Does it have to be? What about requiring default expressions to be values? Their semantics would then be easier to understand, and they would be easier to optimize. |
|
For reference, I initially implemented ocamlformat assuming the equivalence It seems to me that this RFC would improve the current situation, and not get in the way later if someone works out a (backward incompatible) redesign of (at least) labeled and optional arguments.
In my experience, such a limitation would eliminate a large fraction of the benefit and uses of default expressions. |
|
I would be interested in seeing operational semantics described, in a core language, for OCaml today and for the RFC. I think that it would be useful to handle first the fragment without labelled and optional arguments, and then add labelled and optional arguments. Regarding @fpottier's idea of forbidding evaluation in optional arguments and in patterns:
Two final notes, more theoretical:
|
I don't think this RFC would prevent you from doing that. As I see it, the language already has n-ary functions, and this RFC simply makes it less opaque for the users by specifying exactly when this transformation occurs. And this transformation has a visible impact because of a number of language features such as optional arguments and effectful patterns. Currently the reconstruction of n-ary functions is hidden away in some part of the compiler; what this RFC proposes is to move this decision earlier, propagate it, and clean up the old code for reconstruction. Then, switching to a strict unary function semantics should become as simple as ensuring the parser always generates unary functions. That could even be turned on with either a command-line flag or a configuration option. So if we plan to keep discussing the issues around n-ary functions and applications, I believe that this RFC is a step in the right direction anyway, while still being mostly compatible with the current semantics. |
|
It may be true that this RFC does not make the situation worse than it already is, from my point of view. One thing that I do not clearly understand is whether the RFC proposes only internal changes in the parse tree (and changes in the semantics of existing code) or also proposes to introduce new syntax. E.g., the RFC mentions this code, which as far as I can tell is not accepted at present: But there does not seem to be a clear explanation of what new concrete syntax is added. |
|
I suspect that @stedolan meant to write one of these: (* with fun *)
fun a b : (int option -> int) -> function
| None -> a + b
| Some c -> a + b + c
(* with let *)
let f a b : int option -> int = function
| None -> a + b
| Some c -> a + b + cI don't think there was any plan to change the concrete syntax. |
Right, it would be nice to have a simple description for teaching and modelling purposes. However, this model doesn't accurately describe OCaml since the release of OCaml 3.0 in 2000. (This RFC doesn't affect this situation, because it proposes changes to abstraction and not application). Having said that, white lies that are true only of a fragment of the language seem fine for teaching and modelling, as the language in its entirety is rarely taught or modelled. This RFC does, by design, break the equivalence between
I think this impression arises only from viewing function application as an associative operator. In OCaml,
@lthls is correct: no new syntax is being proposed here, and in writing the RFC I mixed up the let-form and the fun-form of function definitions, writing I think the canonical reference for the semantics of OCaml functions with labels remains @garrigue's thesis, or the shorter paper Labeled and optional arguments for Objective Caml. Note that that paper defines an n-ary application form and a unary abstraction form where the function parameter binders are always simple variable patterns. In terms of that semantics, the change proposed here is about how n-ary functions with complex binders are desugared: currently, the matching is interleaved with the arguments (causing the various problems described above), and the proposal is to do the matching after the |
Indeed, OCaml is supposed to be based on lambda-calculus, where application is left associative. I think that it is a bad idea to deviate from this. As you noted, this is already the case in OCaml today, so you may be right in saying that this RFC does not make things worse in this regard. I am afraid that n-ary abstractions may be brittle, in that their recognition might be disturbed by type annotations or scope annotations that occur in the middle of them. If I write Regarding n-ary applications, I believe that they are also syntactically brittle, by which I mean that I am afraid that some constructs (such as |
In defence of OCaml's behaviour, I'd say that the lambda-calculus considers only functions of the form
No. This is not because of the presence of the type annotation, but because you split the function in two. The semantics under this RFC is the same as if you had written: If Similarly, your second example which is different from the case where you had not split the function, if p1 has side-effects. Type and scope annotations do not have any effect here. The distinction between unary and n-ary abstractions is based on concrete syntax: did you use the n-ary function syntax |
|
I agree with what you have written so far and I remain sympathetic to this issue, but note that when you say:
This is not wrong, but I would note that there is an obvious way to extend the lambda-calculus with patterns in function-binding position, which is to desugar My updated summary of the RFC is as follows:
|
|
Your summary sounds right to me. A couple of small tweaks:
I think you got "abstractions" and "applications" swapped here.
Yes, for most forms of effectful pattern. For default arguments, the current semantics is not specified precisely, and approximates the n-ary abstraction semantics. (It is only an approximation because the information needed to do it precisely is currently discarded) |
That's a good property! Overall, the RFC sounds reasonable to me. The features that I dislike preexist and it does not seem to create more harm. |
|
Sorry not to have participated in the discussion yet. Overall, I agree with Stephen's understanding of the semantics of labelled and optional arguments. I will just add a bit of historical background. Originally default arguments were just compiled that way, so that one could say that we had an eager evaluation semantics, both for partial applications and default arguments. Now the goal of this RFC seems to be to define precisely when evaluation of default arguments (and access to mutable patterns) occurs. If this is indeed the goal, I agree that introducing an explicit notion of "evaluation arity" is needed, i.e. no evaluation at all will occur if a function gets less arguments that its internal arity. A concern is whether we really want to specify this. One might want to use other compilation strategies in the future. For instance, one could imagine a situation where we would keep information about how to compute defaults in the cmx, to make it more efficient. Also, do we expect the javascript backend to conform to this specification? This said, another interesting point of the RFC is about using this arity information to make uncurrying in the backend more predictable. If this is the case, then I think this makes more sense. After all, this problem comes from optimizations that have the native code compiler in mind. So what we are talking about is not so much the semantics of the language, which is already relatively well defined modulo some explicit leeway, but what we can assume the native code compiler will do with the code. So, while I agree that this proposal makes sense, it might be better to see it as a specification of the compiler rather than a specification of the language. (Where somebody will probably tell me that the compiler is the only specification of the language anyway...) |
I think this should be automatic, since JSOO takes bytecode as input. (cc @hhugo) |
There's also melange, so cc @anmonteiro as well |
|
The general sense I get from the discussion in this RFC so far is that now there is a general consensus that the proposal is reasonable and we could move forward. More precisely, but slightly sarcastically (no offense intended), the opinions given were diverse and mostly fell in the following camps:
With this variety of somewhat sympathetic opinions, I think it would be reasonable to move forward with an implementation. |
|
The implementation ocaml/ocaml#12236 has now been merged. A non-trivial point that came up in the implementation but was missing here is the effect of this change on GADT typing. It was previously possible to write a function of the form |
[RFC text copied below]
Syntactic function arity
OCaml has dedicated syntax for n-ary (multiple-parameter) functions, written:
for lambda expressions, or:
for let-bound functions.
However, while the OCaml parsetree already contains n-ary function applications, function definitions are internally always unary. That is, the function definition
fun P1 P2 P3 -> BODYis currently equivalent to:The proposal in this RFC is to introduce n-ary function definitions to the parsetree, with semantics so that they evaluate their argument patterns only after all arguments have been received, rather than one at a time. This makes the above equivalent instead to:
The motivation for this change is in two parts, explained in detail below:
Semantics: in the relatively rare cases where it makes a difference, the proposed semantics is both clearer and closer to programmer expectations than the current semantics.
Implementation: the proposed semantics would allow us to simplify some tricky and fragile parts of the compiler.
The precise change to the parsetree is discussed in the final section, which requires a careful treatment of the
functionkeyword (since the reasonably common patternlet f a b = function ...defines a function of arity 3, not 2).Semantic changes
The proposed change is a change to the order of evaluation of function argument binding, which can make a difference only via side effects. Such side effects are relatively rare, but do occur in mutable patterns, optional argument defaults, and patterns that raise exceptions.
Mutable patterns
With the current semantics of n-ary function definitions, the following program prints
0both times (code from issue #7789):The read of the field
nbyfhappens at the time of the creation of the partial application(f udata), rather than at the time of the eventual callg 1.0, and the value0is cached.This behaviour was changed to match the specification in 4.06. The current behaviour is sufficiently confusing that it was repeatedly reported as a bug in 4.06 (#7675, #7789, #10032, #10641), and in 4.12 a new warning (68,
match-on-mutable-state-prevent-uncurry) was introduced which fires if this behaviour ever occurs.The semantics proposed here amount to returning to the pre-4.06 behaviour, but this time with a specification to match. Warning 68 would also become redundant and be removed.
Optional argument defaults
Optional arguments can have default values, which are evaluated if the optional argument is not specified, e.g.:
Due to an intentional loophole in the current specification, the exact point at which
default ()is evaluated is not specified. Thebehaviour of the current implementation is to try to delay evaluation of defaults until the function body. However, this sometimes results in evaluation at an unexpected time (cf. issues 7531, 5975).
First, since OCaml 4.13, defaults may be evaluated earlier than the function body. This was changed after a number of soundness bugs were discovered in the delaying heuristic in prior versions, as defaults were being delayed past patterns that may depend on their results. In the current behaviour, the rules for exactly when a default is evaluated are subtle. For instance, evaluation order in the above example of
default ()changes if the variable patternyabove changes to the record pattern{y}(regardless of mutability).Second, since functions in the Parsetree are always unary, it is not always clear where the function body is, and defaults may be evaluated later than expected. For instance, consider:
The function returned by
make_counter_1returns the number of distinct strings it has seen. However, the function returned bymake_counter_2always returns 1. The difference is that the module open constructHashtbl.(...)in the former is sufficient to block delaying of default evaluation, while in the latter the creation of the hashtable is delayed pastfun s -> ....The semantics proposed here are that default arguments will always be evaluated at the start of the function body, which in most cases matches the current behaviour. The issues that arose with reordering do not affect the proposed semantics, as all arguments are equally delayed.
Patterns that raise
Nonexhaustive patterns and lazy patterns can raise exceptions during evaluation (as can certain edge cases of unboxed floats, when interrupted by signal handlers). The proposed change to the semantics of n-ary functions will affect functions with such patterns. Specifically, the following example will no longer raise a match failure at definition time:
Instead, the match failure will only be raised once
gis called. This is an observable change in behaviour, but it's hard to imagine a program relying on it.Implementation considerations
The current logic to deal with the interleaving of patterns and lambdas is quite subtle in parts, and these parts could be simplified
with the proposed semantics. In particular, the logic for delaying default arguments (described above) would be simplified, as would the logic for first-class module patterns (which for reasons to do with the structure of Lambda, already partially delay binding). Finally, the currying optimisations would become more reliable, as described below.
The counterpoint is that some new work would need to be done to plumb the arity of function expressions from parsetree through to Lambda, as these are currently erased by the parser and approximately recovered by Lambda (and are already preserved from Lambda onwards).
Currying detection
Function definitions in OCaml are curried by default, in principle accepting one argument at a time and returning a closure accepting the rest. Since most function applications are saturated (that is, they pass the same number of arguments as the function definition expects), the implementation of functions is optimised to make saturated applications fast.
However, this optimisation requires knowing how many arguments a function definition expects. Since this information is not currently tracked in the parse tree, it must be guessed later, and sometimes this guess is wrong. For example:
Here
make_greeteris detected as a 3-argument function rather than a 2-argument one, because the partial application ofhelloresults in construction of aLambda.Lfunctionwhich is picked up by the currying optimisation as though it were a third argument tomake_greeter. Saturated two-argument applications of this function are therefore not optimised.N-ary lambdas would make this issue go away, because the number of parameters of a function is known rather than inferred.
Parsetree changes
As well as the
fun a b c -> ...syntax, OCaml additionally supports matching on the final argument of a function using thefunctionkeyword, allowing for instance the definition of a three-argument function as:Additionally, a return type annotation may appear in a function:
If the
functionkeyword is being used to match on the final argument, then the only place the return type annotation may legally appear is just before the final argument:Finally, newtypes may appear interspersed with function arguments, as in
fun a (type t) b -> ...So, the proposed representation for functions contains a sequence of arguments (either function parameters or newtypes), followed by an optional type annotation, followed by a body (either a single expression or a case list introduced by
function). The type annotation types the body in either case.The proposed addition to the parsetree is as follows: