From 935116542710a8171503bf325b5aa04444c0ef30 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Mon, 2 Nov 2020 16:17:10 +0100 Subject: [PATCH 01/11] Spec: Refactor description of deserializatoin MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is a significant rewirte of the section in the spec describing deserialization. Some points worth noting: * This replaces the elaboration relation with one that takes a value and a type as input, and returns a value. This makes it clearer that there is no “input _type_” of deeper relevance. * One great benefit: No more long prose about how the rules assume these types to be principal, even if they aren’t. This clarifies questions like [this one](https://github.com/dfinity/candid/pull/126#discussion_r515248927) * Also, the existing rules about “elaborating function values” were kinda bogus, as function values are just accepted as they are. This is much easier now, also good. * Unfortuantely for this application, our textual representation has overloading. Instead of defining yet another abstract value algebra, I did a bit of hand-waving, and defined an “overloading-free fragment” of the textual format for this section. * A fair number of rules becomes simpler. Promising! * I was able to phrase some interesting properties more formally. Also promising! Didn’t actually prove them, though, althogh we should. * Still unclear to me what we mean with “subtyping is complete”, see the section there. * Decoding is still an inductive relation, so not fully algorithmic. This may be an issue with a stright-forward formalization; not all theorem provers like negative occurrences of inductive relations in their rules. * Syntax of the relations up for discussion. --- spec/Candid.md | 260 +++++++++++++++++++++++++++++++------------------ 1 file changed, 167 insertions(+), 93 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 19d725e0b..9de30259b 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -864,142 +864,199 @@ service { ;* } <: service { ;* } service { : ; ;* } <: service { : ; ;* } ``` -### Elaboration +### Decoding -To define the actual coercion function used during deserialisation of a value, we extend the subtyping relation to a ternary *elaboration* relation `T <: T' ~> f`, where `f` is a suitable coercion function of type `T -> T'`. +This subtyping is implemented when decoding a Candid value. We provide a ternary relation `V :? T ~> V'` to describe when a value `V` can be decoded at type `T` to a value `V'` of type `T`. +The binary relation `V !: T` describes that a value cannot be decoded at type `T`, and is defined as +``` +not (∃ v'. :? ~> v') +------ + !: +``` + +Here `V` models an untyped values, and form an abstract data model of both the message in transit (`V`), as well as the the message . In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: + + * Number literals (``) must be immediately enclosed with an `` that clarifies the precise number type. + * The value of type `reserved` is expressed as `(null : reserved)`. + * No other use of ``. #### Primitive Types +Values of primitive types decode successfully at their own type: ``` + ∈ {nat, int, nat8, nat16, nat32, nat64, int8, int16, int32, int64, float32, float64 } +------------------------------------- +( : ) :? ~> ( : ) --------------------------------- - <: ~> \x.x +-------------------- +true :? bool ~> true +---------------------- +false :? bool ~> false ------------------- -Nat <: Int ~> \x.x +------------------------ + :? text ~> +------------------------ +null :? null ~> null +``` ------------------------------- - <: reserved ~> \x.x +Values of type `nat` decode at type `int`: +``` +--------------------------------- +( : nat) :? int ~> ( : int) +``` +Any value decodes at type `reserved`, producing the canonical value of type `reserved`: +``` ------------------------------ -empty <: ~> \_.unreachable + :? reserved ~> null : reserved ``` -#### Options and Vectors +NB: No rule needed for type `empty`, because there are no values of that type. By definition, ` !: empty`. -``` - <: ~> f ---------------------------------------------------- -vec <: vec ~> \x.map_vec f x +#### Vectors -not (null <: ) ---------------------------------- -null <: opt ~> \x.null +Only vectors decode at vector types, and only if all elements decode successfully. -not (null <: ) - <: ~> f +``` + :? ~> ------------------------------------------ - <: opt ~> \x.?(f x) +vec { ;* } :? vec ~> vec { ;* } +``` - <: - <: ~> f ---------------------------------- -opt <: opt - ~> \x.case x of () => null | ?y => if f y = _|_ then null else ?(f y) +#### Options -not (null <: ) - <: - <: ~> f ---------------------------------- - <: opt - ~> \x.if f x = _|_ then null else ?(f x) +The null value decodes at any option type: +``` +----------------------- +null :? opt ~> null ``` -The last two rules cover both cases of subtyping on options. -It *optimistically* tries to decode an option value -and succceeds if there would have been a valid type `` for the input value that is a subtype of both types. -(As formulated, the rule would be non-deterministic in the choice of ``, but the intention is to pick the largest type that makes `f y` succeed if possible. We take the liberty to hand-wave over the formulation of this detail here.) -The effect of this rule is that decoders do not actually need to perform separate subtype checks during deserialisation. -Instead, they can simply try to deserialise, and if they encounter a mismatch abort to the innermost option type, returning `null`. +An optional value decodes at an option type, if the value decodes at the constituent type: +``` + :? ~> +----------------------- +opt :? opt ~> opt +``` -*Note:* The working assumption is that the type describing an incoming value ought to be *principal*, i.e., the most precise type assignable to the value (in a sense that could be made precise). -In that case, `` equals `` and deserialization succeeds exactly if the types match. +If an optional value _fails_ to decode at an optional type, the result is `null`, not failure: +``` + !: +----------------------- +opt :? opt ~> null +``` -However, in practice it would be costly to enforce the requirement that all type descriptions in a serialised value are principal, for both encoder (who would need to compute the principal type) and decoders (who would need to check it). -For example, to be principal, an empty vector would need to have type `vec empty`, `null` could only have type `null`, and a variant would only be allowed to include tags that actually occur in the value. -Consequently, the semantics allows for the possibility that the encoder ascribes a less specific type `` with some redundant information, and allows the decoder to ignore redundant information (by going to ``) like the element type of empty arrays, of null options, or unused variant tags. +Decoding a non-null, non-optional and non-reserved value at an option type implicitly wraps it in `opt`: +``` + ≠ null + ≠ (null : reserved) + ≠ opt _ +opt :? opt ~> +----------------------- + :? opt ~> +``` #### Records +Only records decode at record type. Missing fields of option type turn into `null`. + +In the following rule, the `*` field names are those present in both the value and the type, `*` field names only in the value, `` only in the type. +``` + :? ~> +-------------------------------------------------------------------------------------------------------------------------------------------- +record { = ;* = ;* } :? record { = ;* = opt ;* } ~> record { = ;* = null;* } ``` ------------------------------------------------- -record { ;* } <: record { } ~> \x.{} - <: ~> f1 -record { ;* } <: record { ;* } ~> f2 ----------------------------------------------------------------------------------------------- -record { : ; ;* } <: record { : ; ;* } - ~> \x.{f2 x with = f1 x.} +#### Variants - not in ;* -record { ;* } <: record { ;* } ~> f ------------------------------------------------------------------------------- -record { ;* } <: record { : opt ; ;* } - ~> \x.{f x with = null} +``` + :? ~> +---------------------------------------------------------------------------------- +variant { = } :? variant { = ;* _ } ~> variant { = } ``` -#### Variants +#### References + +Function and services references decode as they are +``` +------------------------------------------------------- +func . :? func ~> func . ``` -------------------------------------------------- -variant { } <: variant { ;* } ~> \x.x +``` +------------------------------------------------------- +service :? service ~> service +``` - <: ~> f1 -variant { ;* } <: variant { ;* } ~> f2 ------------------------------------------------------------------------------------------------- -variant { : ; ;* } <: variant { : ; ;* } - ~> \x.case x of y => (f1 y) | _ => f2 x +``` +------------------------------------------------------- +principal :? principal ~> principal ``` +#### Tuple types -#### Functions +Whole argument and result sequences are decoded with the same rules are tuple-like records. In particular, extra arguments are ignored, and optional arguments treated as `null` if decoding fails, and missing optional arguments treated at `null`: ``` -record { N1':;* } <: record { N1:;* } ~> f1 -record { N2:;* } <: record { N2':;* } ~> f2 ------------------------------------------------------------------------------------------------------------------- -func ( ,* ) -> ( ,* ) * <: func ( ,* ) -> ( ,* ) * - ~> \x.\y.f2 (x (f1 y)) +record {;*} :? record {;*} ~> record {,*} +--------------------------------------------------- +(,*) :? (,*) ~> (,*) ``` -#### Services -``` +## Properties -------------------------------------------------- -service { ;* } <: service { } ~> \x.{} +The relations above have certain properties. To express them, we need the relation `V : T`, expressing that `V` has inherently type `T`. Instead of listing the rules of that relation, we define it via the first property below; - <: ~> f1 -service { ;* } <: service { ;* } ~> f2 ------------------------------------------------------------------------------------------------- -service { : ; ;* } <: service { : ; ;* } - ~> \x.{f1 x; = f2 x.} -``` +* Correctness and completeness of decoding: + ``` + (v : T) ⟺ (v :? T ~> v) + ``` +* Uniqueness of decoding: + ``` + v :? T ~> v1, v :? T ~> v2 ⇒ v1 = v2 + ``` -## Open Questions +* Soundness of subtyping: + ``` + v : T, T <: T' ⇒ ∃ v'. v :? T ~> v' + ``` -* Support default field values? -* Support generic type definitions? -* Namespaces for imports? +* Transitivity of subtyping: + ``` + T <: T2, T2 <: T3 ⇒ T <: T3 + ``` + +* Transitive coherence does not hold: + ``` + T <: T2, T2 <: T3 + v1 : T + v :? T2 ~> v2 + v :? T3 ~> v3 + v2 :? T3 ~> v3' + ``` + does not imply `v3 = v3'`. + + Can prove that `R(v3,v3')` for the smallest homomorphic relation `R` that is reflexive and satsified `∀ v. R(opt v, null)`. + +Other design goals are not satisfied (or hard to express forally). + +* Completeness of subtyping: + + It seems the fomulation could be that if all values of a type `T` decode at type `T'`, then `T` is a subtype of `T'`. + ``` + (∀ v : T. ∃ v'. v :? T ~> v') =>, T <: T' + ``` + + But we don't have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }`… ## Binary Format @@ -1195,17 +1252,6 @@ Note: * It is unspecified how references *r* are represented, neither internally nor externally. When binding to Wasm, their internal representation is expected to be based on Wasm reference types, i.e., `anyref` or subtypes thereof. It is up to the system how to represent or translate the reference table on the wire. -### Deserialisation - -Deserialisation is the parallel application of the inverse functions of `T`, `M`, and `R` defined above, with the following mechanism for robustness towards future extensions: - -* A serialised type may be headed by an opcode other than the ones defined above (i.e., less than -24). Any such opcode is followed by an LEB128-encoded count, and then a number of bytes corresponding to this count. A type represented that way is called a *future type*. - -* A value corresponding to a future type is called a *future value*. It is represented by two LEB128-encoded counts, *m* and *n*, followed by a *m* bytes in the memory representation M and accompanied by *n* corresponding references in R. - -These measures allow the serialisation format to be extended with new types in the future, as long as their representation and the representation of the corresponding values include a length prefix matching the above scheme, and thereby allowing an older deserialiser not understanding them to skip over them. The subtyping rules ensure that upgradability is maintained in this situation, i.e., an old deserialiser has no need to understand the encoded data. - - ### Parameters and Results `A` defines the argument mapping. Essentially, an argument list is serialised into the triple (T,M,R) as if it was a single closed record. T and M are combined into a single byte stream B, where they are preceded by the string "DIDL" as a magic number and a possible list of type definitions. @@ -1227,3 +1273,31 @@ The same representation is used for function results. Note: * It is unspecified how the pair (B,R) representing a serialised value is bundled together in an external environment. + +### Deserialisation + +Deserialisation at an expected type sequence `(,*)` proceeds by + + * checking for the magic number `DIDL` + * using the inverse of the `T` function to parse the type definitions `(*)` + * using the inverse of the `M` function, indexed by `ts`, to parse the values `(*)` + * use the decoding relation `(,*) :? (,*) ~> (,*)` to try to understand the parsed values at the expected type. + +### Deserialisation of future types + +Deserialisation uses the following mechanism for robustness towards future extensions: + +* A serialised type may be headed by an opcode other than the ones defined above (i.e., less than -24). Any such opcode is followed by an LEB128-encoded count, and then a number of bytes corresponding to this count. A type represented that way is called a *future type*. + +* A value corresponding to a future type is called a *future value*. It is represented by two LEB128-encoded counts, *m* and *n*, followed by a *m* bytes in the memory representation M and accompanied by *n* corresponding references in R. + +These measures allow the serialisation format to be extended with new types in the future, as long as their representation and the representation of the corresponding values include a length prefix matching the above scheme, and thereby allowing an older deserialiser not understanding them to skip over them. The subtyping rules ensure that upgradability is maintained in this situation, i.e., an old deserialiser has no need to understand the encoded data. + + +## Open Questions + +* Support default field values? +* Support generic type definitions? +* Namespaces for imports? + + From 4c9bc55ed049c03a7192e77113f20379b1383d9a Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Mon, 2 Nov 2020 16:58:21 +0100 Subject: [PATCH 02/11] tidbits --- spec/Candid.md | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 9de30259b..5f56b6735 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -915,14 +915,14 @@ Any value decodes at type `reserved`, producing the canonical value of type `res :? reserved ~> null : reserved ``` -NB: No rule needed for type `empty`, because there are no values of that type. By definition, ` !: empty`. +NB: No rule needed for type `empty`, because there are no values of that type. By construction, ` !: empty`. #### Vectors Only vectors decode at vector types, and only if all elements decode successfully. ``` - :? ~> + :? ~> ------------------------------------------ vec { ;* } :? vec ~> vec { ;* } ``` @@ -937,19 +937,19 @@ null :? opt ~> null An optional value decodes at an option type, if the value decodes at the constituent type: ``` - :? ~> + :? ~> ----------------------- opt :? opt ~> opt ``` If an optional value _fails_ to decode at an optional type, the result is `null`, not failure: ``` - !: + !: ----------------------- opt :? opt ~> null ``` -Decoding a non-null, non-optional and non-reserved value at an option type implicitly wraps it in `opt`: +Decoding a non-null, non-optional and non-reserved value at an option type treats it as an optional value: ``` ≠ null ≠ (null : reserved) @@ -964,7 +964,7 @@ opt :? opt ~> Only records decode at record type. Missing fields of option type turn into `null`. -In the following rule, the `*` field names are those present in both the value and the type, `*` field names only in the value, `` only in the type. +In the following rule, the `*` field names are those present in both the value and the type, the `*` field names only those in the value, and `*` are those only in the type. ``` :? ~> -------------------------------------------------------------------------------------------------------------------------------------------- @@ -977,13 +977,13 @@ record { = ;* = ;* } :? record { = ;* = ``` :? ~> ---------------------------------------------------------------------------------- -variant { = } :? variant { = ;* _ } ~> variant { = } +variant { = } :? variant { = ; _;* } ~> variant { = } ``` #### References -Function and services references decode as they are +Function and services references decode unconditoinally ``` ------------------------------------------------------- @@ -996,13 +996,13 @@ service :? service ~> service ``` ``` -------------------------------------------------------- +------------------------------------------------- principal :? principal ~> principal ``` #### Tuple types -Whole argument and result sequences are decoded with the same rules are tuple-like records. In particular, extra arguments are ignored, and optional arguments treated as `null` if decoding fails, and missing optional arguments treated at `null`: +Whole argument and result sequences are decoded with the same rules are tuple-like records. In particular, extra arguments are ignored, and optional parameters read as as `null` if the argument is missing or fails to decode: ``` record {;*} :? record {;*} ~> record {,*} @@ -1013,11 +1013,16 @@ record {;*} :? record {;*} ~> record {,*} ## Properties -The relations above have certain properties. To express them, we need the relation `V : T`, expressing that `V` has inherently type `T`. Instead of listing the rules of that relation, we define it via the first property below; +The relations above have certain properties. To express them, we need the relation `V : T`, expressing that `V` has inherently type `T`. Instead of defining this relation on its own, we take the first property below as its definition: * Correctness and completeness of decoding: ``` - (v : T) ⟺ (v :? T ~> v) + (v : T) ⟺ (∃ v'. v' :? T ~> v) + ``` + +* Roundtripping: + ``` + (v : T) ⟺ v :? T ~> v ``` * Uniqueness of decoding: @@ -1039,24 +1044,23 @@ The relations above have certain properties. To express them, we need the relati ``` T <: T2, T2 <: T3 v1 : T - v :? T2 ~> v2 - v :? T3 ~> v3 + v1 :? T2 ~> v2 + v1 :? T3 ~> v3 v2 :? T3 ~> v3' ``` does not imply `v3 = v3'`. - Can prove that `R(v3,v3')` for the smallest homomorphic relation `R` that is reflexive and satsified `∀ v. R(opt v, null)`. + However, it implies that `R(v3,v3')`, where `R` is the smallest homomorphic, reflexive relation `R` that satisfies `∀ v. R(opt v, null)`. -Other design goals are not satisfied (or hard to express forally). +Other design goals are not satisfied (or hard to express formally?): * Completeness of subtyping: - It seems the fomulation could be that if all values of a type `T` decode at type `T'`, then `T` is a subtype of `T'`. + It seems the formulation could be that if all values of a type `T` decode at type `T'`, then `T` is a subtype of `T'`. ``` - (∀ v : T. ∃ v'. v :? T ~> v') =>, T <: T' + (∀ v : T. ∃ v'. v :? T ~> v') ⇒ T <: T' ``` - - But we don't have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }`… + But for example we do not have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }`. Nor do we relate arbitray function types. ## Binary Format From 63fc356095bdf108a554bfe4b80f20df5cb74e43 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 08:18:23 +0100 Subject: [PATCH 03/11] Tweaks to Completeness of subtyping --- spec/Candid.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 5f56b6735..3fa0c1138 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -1056,11 +1056,13 @@ Other design goals are not satisfied (or hard to express formally?): * Completeness of subtyping: - It seems the formulation could be that if all values of a type `T` decode at type `T'`, then `T` is a subtype of `T'`. + A rigorous formulation of completeness could be that `<:` is the largest transitive and sound relation (in the above sense), i.e. that ``` - (∀ v : T. ∃ v'. v :? T ~> v') ⇒ T <: T' + (∀ v. v :? T ~> _ ⇒ v :? T' ~> _) ⇒ T <: T' ``` - But for example we do not have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }`. Nor do we relate arbitray function types. + This does not hold as state, because of counter examples involving the empty type. For example we do not have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }` + + Nor do we relate arbitray function types (where decoding always succeeds). ## Binary Format From 630523927707c363892b94f02e98639a300620ab Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 09:21:06 +0100 Subject: [PATCH 04/11] Link to IDL-Soundness.md --- spec/Candid.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 3fa0c1138..bb9d1db2d 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -1032,7 +1032,16 @@ The relations above have certain properties. To express them, we need the relati * Soundness of subtyping: ``` - v : T, T <: T' ⇒ ∃ v'. v :? T ~> v' + T <: T' ⇒ ∀ v : T. ∃ v'. v :? T ~> v' + ``` + +* Higher-order soundness of subtyping + See <./IDL-Soundness.md>, with the following instanstatiations: + ``` + s1 ~> s2 ≔ s2 <: s1 + t1 <: t2 ≔ (∀ v. t1: v :? t2 ~> _) + s1 in t1 <: s2 in t2 ≔ (to be done) + s1 <:h s2 ≔ (host-language dependent) ``` * Transitivity of subtyping: @@ -1044,9 +1053,8 @@ The relations above have certain properties. To express them, we need the relati ``` T <: T2, T2 <: T3 v1 : T - v1 :? T2 ~> v2 v1 :? T3 ~> v3 - v2 :? T3 ~> v3' + v1 :? T2 ~> v2, v2 :? T3 ~> v3' ``` does not imply `v3 = v3'`. From 4aa6b2fc8cca2b3aeabdcb937760a708e4a509af Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 12:25:32 +0100 Subject: [PATCH 05/11] Apply suggestions from code review Co-authored-by: Andreas Rossberg --- spec/Candid.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index bb9d1db2d..c31a3a6b7 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -875,7 +875,7 @@ not (∃ v'. :? ~> v') !: ``` -Here `V` models an untyped values, and form an abstract data model of both the message in transit (`V`), as well as the the message . In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: +Here `V` models untyped values, which form an abstract data model of both the message in transit (`V`), as well as the the message. In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: * Number literals (``) must be immediately enclosed with an `` that clarifies the precise number type. * The value of type `reserved` is expressed as `(null : reserved)`. @@ -983,7 +983,7 @@ variant { = } :? variant { = ; _;* } ~> variant { = :? principal ~> principal #### Tuple types -Whole argument and result sequences are decoded with the same rules are tuple-like records. In particular, extra arguments are ignored, and optional parameters read as as `null` if the argument is missing or fails to decode: +Whole argument and result sequences are decoded with the same rules as tuple-like records. In particular, extra arguments are ignored, and optional parameters read as as `null` if the argument is missing or fails to decode: ``` record {;*} :? record {;*} ~> record {,*} @@ -1046,13 +1046,13 @@ The relations above have certain properties. To express them, we need the relati * Transitivity of subtyping: ``` - T <: T2, T2 <: T3 ⇒ T <: T3 + T1 <: T2, T2 <: T3 ⇒ T1 <: T3 ``` * Transitive coherence does not hold: ``` - T <: T2, T2 <: T3 - v1 : T + T1 <: T2, T2 <: T3 + v1 : T1 v1 :? T3 ~> v3 v1 :? T2 ~> v2, v2 :? T3 ~> v3' ``` @@ -1314,4 +1314,3 @@ These measures allow the serialisation format to be extended with new types in t * Support generic type definitions? * Namespaces for imports? - From 03d895063b6498af927ebd186b8d5fa9ccf5ae94 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 12:53:11 +0100 Subject: [PATCH 06/11] Allow removal of `reserved`, fixes #131 --- spec/Candid.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index c31a3a6b7..ba5b6a9bf 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -807,12 +807,13 @@ record { ;* } <: record { ;* } record { : ; ;* } <: record { : ; ;* } ``` -In order to be able to evolve and extend record types that also occur in inbound position (i.e., are used both as function results and function parameters), the subtype relation also supports *removing* fields from records, provided they are optional. +In order to be able to evolve and extend record types that also occur in inbound position (i.e., are used both as function results and function parameters), the subtype relation also supports *removing* fields from records, provided they are optional (or `reserved`). ``` not in ;* record { ;* } <: record { ;* } +opt empty <: ------------------------------------------------------------------------------ -record { ;* } <: record { : opt ; ;* } +record { ;* } <: record { : ; ;* } ``` *Note:* This rule is unusual from a regular subtyping perspective, but necessary in practice. Together with the previous rule, it allows extending any record with optional fields in an upgrade, regardless of how it is used. From e49e2dfc75bd5636a44f1d84c6b50b70a612e31a Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 13:22:20 +0100 Subject: [PATCH 07/11] =?UTF-8?q?Different=20relation,=20,=20fix?= =?UTF-8?q?=20record=20field=20and=20`reserved`=E2=80=A6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- spec/Candid.md | 164 +++++++++++++++++++++++++------------------------ 1 file changed, 84 insertions(+), 80 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index ba5b6a9bf..2a4f75d4a 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -77,15 +77,18 @@ This is a summary of the grammar proposed: ::= | | | ::= - | nat | nat8 | nat16 | nat32 | nat64 - | int | int8 | int16 | int32 | int64 - | float32 | float64 + | | bool | text | null | reserved | empty + ::= + | nat | nat8 | nat16 | nat32 | nat64 + | int | int8 | int16 | int32 | int64 + | float32 | float64 + ::= | opt | vec @@ -230,13 +233,17 @@ The content of message arguments and results is *data*. Three basic forms of *da *Primitive types* describe the possible forms of primitive data. +The primitive types that describe numbers are separted out in the grammar: +``` + ::= | ... +``` #### Natural Numbers The type `nat` describes a natural number (unsigned integer) of unlimited range. There are also variants limited to 8, 16, 32, or 64 bit value range with fixed-size representations. ``` - ::= nat | nat8 | nat16 | nat32 | nat64 | ... + ::= nat | nat8 | nat16 | nat32 | nat64 | ... ``` **Note:** Values of type `nat` have variable length representations in the binary serialisation format, and hence take up space proportional to (the logarithm of) their value. As long as typical values are small, they may hence be more space-efficient than the fixed size types. @@ -245,7 +252,7 @@ The type `nat` describes a natural number (unsigned integer) of unlimited range. The type `int` describes an integer number (signed) of unlimited range. There are also variants limited to 8, 16, 32, or 64 bit value range with fixed-size representations. ``` - ::= ... | int | int8 | int16 | int32 | int64 | ... + ::= ... | int | int8 | int16 | int32 | int64 | ... ``` **Note:** Values of type `nat` have variable length representations in the binary serialisation format, and hence take up space proportional to (the logarithm of) their value. As long as typical values are small, they may hence be more space-efficient than the fixed size types. @@ -254,7 +261,7 @@ The type `int` describes an integer number (signed) of unlimited range. There ar Floating-point values are represented in IEEE 754 binary format and are supported in single precision (32 bit) and double precision (64 bit). ``` - ::= ... | float32 | float64 | ... + ::= ... | float32 | float64 ``` #### Boolean @@ -865,150 +872,147 @@ service { ;* } <: service { ;* } service { : ; ;* } <: service { : ; ;* } ``` -### Decoding +### Coercing -This subtyping is implemented when decoding a Candid value. We provide a ternary relation `V :? T ~> V'` to describe when a value `V` can be decoded at type `T` to a value `V'` of type `T`. +This subtyping is implemented during the deserialization of Candid at an expected type. As as described in [Section Deserialisation](#deserialization), the binary value is conceptually first _decoded_ into an abstract value, and then _coerced_ to the expected type. -The binary relation `V !: T` describes that a value cannot be decoded at type `T`, and is defined as -``` -not (∃ v'. :? ~> v') ------- - !: -``` +This section describes the covercion, as ternary relation `V ~> V' : T` to describe when a value `V` can be coerced to a value `V'` of type `T`. The fields `V` and `T` can be understood as inputs and `V'` as the output of this relation. -Here `V` models untyped values, which form an abstract data model of both the message in transit (`V`), as well as the the message. In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: +Here `V` models untyped values, which form an abstract data model of both the message in transit (`V`), as well as the result of the coercion (`V'`). In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: * Number literals (``) must be immediately enclosed with an `` that clarifies the precise number type. - * The value of type `reserved` is expressed as `(null : reserved)`. + * The canonical value of type `reserved` is expressed as `(null : reserved)`. * No other use of ``. #### Primitive Types -Values of primitive types decode successfully at their own type: +Values of primitive types coerce successfully at their own type: ``` - ∈ {nat, int, nat8, nat16, nat32, nat64, int8, int16, int32, int64, float32, float64 } -------------------------------------- -( : ) :? ~> ( : ) +-------------------------------------------------- +( : ) ~> ( : ) : --------------------- -true :? bool ~> true +------------------- +true ~> true : bool ---------------------- -false :? bool ~> false +false ~> false : bool ------------------------- - :? text ~> +----------------------- + ~> : text ------------------------ null :? null ~> null ``` -Values of type `nat` decode at type `int`: +Values of type `nat` coerce successfully at type `int`: ``` ---------------------------------- -( : nat) :? int ~> ( : int) +-------------------------------- +( : nat) ~> ( : int) : int ``` -Any value decodes at type `reserved`, producing the canonical value of type `reserved`: +Any value coerces at type `reserved`, producing the canonical value of type `reserved`: ``` ------------------------------- - :? reserved ~> null : reserved +----------------------------------- + ~> (null : reserved) : reserved ``` -NB: No rule needed for type `empty`, because there are no values of that type. By construction, ` !: empty`. +NB: No rule is needed for type `empty`, because there are no values of that type. By construction, ` ~> _ : empty` will not hold. #### Vectors -Only vectors decode at vector types, and only if all elements decode successfully. +Only vectors coerce at vector types, and only if all elements coerce successfully. ``` - :? ~> ------------------------------------------- -vec { ;* } :? vec ~> vec { ;* } + ~> : +----------------------------------------- +vec { ;* } ~> vec { ;* } : vec ``` #### Options -The null value decodes at any option type: +The null value coerce at any option type: ``` ------------------------ -null :? opt ~> null +---------------------- +null ~> null : opt ``` -An optional value decodes at an option type, if the value decodes at the constituent type: +An optional value coerces at an option type, if the constituent value coerces at the constituent type: ``` - :? ~> ------------------------ -opt :? opt ~> opt + ~> : +----------------------------- +opt ~> opt : opt ``` -If an optional value _fails_ to decode at an optional type, the result is `null`, not failure: +If an optional value _fails_ to coerce at an optional type, the result is `null`, not failure: ``` - !: ------------------------ -opt :? opt ~> null +not ( ~> _ : ) +------------------------- +opt ~> null : opt ``` -Decoding a non-null, non-optional and non-reserved value at an option type treats it as an optional value: +Coercing a non-null, non-optional and non-reserved value at an option type treats it as an optional value: ``` ≠ null ≠ (null : reserved) ≠ opt _ -opt :? opt ~> ------------------------ - :? opt ~> +opt ~> : opt +------------------------- + ~> : opt ``` #### Records -Only records decode at record type. Missing fields of option type turn into `null`. +Only records coerce at record type. Missing fields of option or reserved type turn into `null`. In the following rule, the `*` field names are those present in both the value and the type, the `*` field names only those in the value, and `*` are those only in the type. ``` - :? ~> --------------------------------------------------------------------------------------------------------------------------------------------- -record { = ;* = ;* } :? record { = ;* = opt ;* } ~> record { = ;* = null;* } + ~> : + = opt _ ∨ = reserved +--------------------------------------------------------------------------------------------------------------------------------------- +record { = ;* = ;* } ~> record { = ;* = null;* } : record { = ;* = ;* } ``` #### Variants +Only a variant value with an expected tag coerces at variant type. + ``` - :? ~> + ~> : ---------------------------------------------------------------------------------- -variant { = } :? variant { = ; _;* } ~> variant { = } +variant { = } ~> variant { = } : variant { = ; _;* } ``` #### References -Function and services references decode unconditionally +Function and services references coerce unconditionally ``` -------------------------------------------------------- -func . :? func ~> func . +------------------------------------------------------ +func . ~> func . : func ``` ``` -------------------------------------------------------- -service :? service ~> service +------------------------------------------------------ +service ~> service : service ``` ``` -------------------------------------------------- -principal :? principal ~> principal +------------------------------------------------ +principal ~> principal : principal ``` #### Tuple types -Whole argument and result sequences are decoded with the same rules as tuple-like records. In particular, extra arguments are ignored, and optional parameters read as as `null` if the argument is missing or fails to decode: +Whole argument and result sequences are coerced with the same rules as tuple-like records. In particular, extra arguments are ignored, and optional parameters read as as `null` if the argument is missing or fails to coerce: ``` -record {;*} :? record {;*} ~> record {,*} ---------------------------------------------------- -(,*) :? (,*) ~> (,*) +record {;*} ~> record {,*} : record {;*} +-------------------------------------------------- +(,*) ~> (,*) : (,*) ``` @@ -1018,29 +1022,29 @@ The relations above have certain properties. To express them, we need the relati * Correctness and completeness of decoding: ``` - (v : T) ⟺ (∃ v'. v' :? T ~> v) + (v : T) ⟺ (∃ v'. v' ~> v : T) ``` * Roundtripping: ``` - (v : T) ⟺ v :? T ~> v + (v : T) ⟺ v ~> v : T ``` * Uniqueness of decoding: ``` - v :? T ~> v1, v :? T ~> v2 ⇒ v1 = v2 + v ~> v1 : T, v ~> v2 : T ⇒ v1 = v2 ``` * Soundness of subtyping: ``` - T <: T' ⇒ ∀ v : T. ∃ v'. v :? T ~> v' + T <: T' ⇒ ∀ v : T. ∃ v'. v ~> v' : T ``` * Higher-order soundness of subtyping - See <./IDL-Soundness.md>, with the following instanstatiations: + See <./IDL-Soundness.md>, with the following instantiations: ``` s1 ~> s2 ≔ s2 <: s1 - t1 <: t2 ≔ (∀ v. t1: v :? t2 ~> _) + t1 <: t2 ≔ (∀ v. v : t1 ⇒ v ~> _:? t2 ) s1 in t1 <: s2 in t2 ≔ (to be done) s1 <:h s2 ≔ (host-language dependent) ``` @@ -1054,8 +1058,8 @@ The relations above have certain properties. To express them, we need the relati ``` T1 <: T2, T2 <: T3 v1 : T1 - v1 :? T3 ~> v3 - v1 :? T2 ~> v2, v2 :? T3 ~> v3' + v1 ~> v3 : T3 + v1 ~> v2 : T2, v2 ~> v2 : T3 ``` does not imply `v3 = v3'`. @@ -1067,7 +1071,7 @@ Other design goals are not satisfied (or hard to express formally?): A rigorous formulation of completeness could be that `<:` is the largest transitive and sound relation (in the above sense), i.e. that ``` - (∀ v. v :? T ~> _ ⇒ v :? T' ~> _) ⇒ T <: T' + (∀ v. v ~> _ : T ⇒ v' ~> _ : T') ⇒ T <: T' ``` This does not hold as state, because of counter examples involving the empty type. For example we do not have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }` @@ -1294,9 +1298,9 @@ Note: Deserialisation at an expected type sequence `(,*)` proceeds by * checking for the magic number `DIDL` - * using the inverse of the `T` function to parse the type definitions `(*)` - * using the inverse of the `M` function, indexed by `ts`, to parse the values `(*)` - * use the decoding relation `(,*) :? (,*) ~> (,*)` to try to understand the parsed values at the expected type. + * using the inverse of the `T` function to decode the type definitions `(*)` + * using the inverse of the `M` function, indexed by `ts`, to decode the values `(*)` + * use the coercion relation `(,*) ~> (,*) : (,*)` to try to understand the decoded values at the expected type. ### Deserialisation of future types From f52fe2a12dd774e6b8447667c553dd5f37df2388 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 13:51:41 +0100 Subject: [PATCH 08/11] whitespace --- spec/Candid.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 2a4f75d4a..26774343a 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -894,13 +894,13 @@ Values of primitive types coerce successfully at their own type: ------------------- true ~> true : bool ----------------------- +--------------------- false ~> false : bool ----------------------- ~> : text ------------------------- +-------------------- null :? null ~> null ``` @@ -924,7 +924,7 @@ NB: No rule is needed for type `empty`, because there are no values of that type Only vectors coerce at vector types, and only if all elements coerce successfully. ``` - ~> : + ~> : ----------------------------------------- vec { ;* } ~> vec { ;* } : vec ``` From 4d2a742002d3bb7da3f36feb8f162cf1347ae9d8 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 15:44:52 +0100 Subject: [PATCH 09/11] Apply suggestions from code review Co-authored-by: Andreas Rossberg --- spec/Candid.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 26774343a..d960bc59d 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -872,11 +872,11 @@ service { ;* } <: service { ;* } service { : ; ;* } <: service { : ; ;* } ``` -### Coercing +### Coercion -This subtyping is implemented during the deserialization of Candid at an expected type. As as described in [Section Deserialisation](#deserialization), the binary value is conceptually first _decoded_ into an abstract value, and then _coerced_ to the expected type. +This subtyping is implemented during the deserialization of Candid at an expected type. As described in [Section Deserialisation](#deserialization), the binary value is conceptually first _decoded_ into an abstract value, and then _coerced_ to the expected type. -This section describes the covercion, as ternary relation `V ~> V' : T` to describe when a value `V` can be coerced to a value `V'` of type `T`. The fields `V` and `T` can be understood as inputs and `V'` as the output of this relation. +This section describes the covercion, as a ternary relation `V ~> V' : T` to describe when a value `V` can be coerced to a value `V'` of type `T`. The fields `V` and `T` can be understood as inputs and `V'` as the output of this relation. Here `V` models untyped values, which form an abstract data model of both the message in transit (`V`), as well as the result of the coercion (`V'`). In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: @@ -900,8 +900,8 @@ false ~> false : bool ----------------------- ~> : text --------------------- -null :? null ~> null +------------------- +null ~> null : null ``` Values of type `nat` coerce successfully at type `int`: @@ -969,7 +969,7 @@ Only records coerce at record type. Missing fields of option or reserved type tu In the following rule, the `*` field names are those present in both the value and the type, the `*` field names only those in the value, and `*` are those only in the type. ``` ~> : - = opt _ ∨ = reserved +opt empty <: --------------------------------------------------------------------------------------------------------------------------------------- record { = ;* = ;* } ~> record { = ;* = null;* } : record { = ;* = ;* } ``` @@ -1032,7 +1032,7 @@ The relations above have certain properties. To express them, we need the relati * Uniqueness of decoding: ``` - v ~> v1 : T, v ~> v2 : T ⇒ v1 = v2 + v ~> v1 : T ∧ v ~> v2 : T ⇒ v1 = v2 ``` * Soundness of subtyping: @@ -1044,7 +1044,7 @@ The relations above have certain properties. To express them, we need the relati See <./IDL-Soundness.md>, with the following instantiations: ``` s1 ~> s2 ≔ s2 <: s1 - t1 <: t2 ≔ (∀ v. v : t1 ⇒ v ~> _:? t2 ) + t1 <: t2 ≔ (∀ v. v : t1 ⇒ v ~> _ : t2 ) s1 in t1 <: s2 in t2 ≔ (to be done) s1 <:h s2 ≔ (host-language dependent) ``` @@ -1318,4 +1318,3 @@ These measures allow the serialisation format to be extended with new types in t * Support default field values? * Support generic type definitions? * Namespaces for imports? - From e8d564dab3b6188c945fca4d3748c1ea8d85f258 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Tue, 3 Nov 2020 15:51:25 +0100 Subject: [PATCH 10/11] Remove section on completeness --- spec/Candid.md | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index d960bc59d..06afada50 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -1065,18 +1065,7 @@ The relations above have certain properties. To express them, we need the relati However, it implies that `R(v3,v3')`, where `R` is the smallest homomorphic, reflexive relation `R` that satisfies `∀ v. R(opt v, null)`. -Other design goals are not satisfied (or hard to express formally?): - -* Completeness of subtyping: - - A rigorous formulation of completeness could be that `<:` is the largest transitive and sound relation (in the above sense), i.e. that - ``` - (∀ v. v ~> _ : T ⇒ v' ~> _ : T') ⇒ T <: T' - ``` - This does not hold as state, because of counter examples involving the empty type. For example we do not have `opt empty <: null`, or `Empty <: t` where `type Empty = rec { Empty }` - - Nor do we relate arbitray function types (where decoding always succeeds). - +The goal of “subtyping completeness” has not been cast into a formal formulation yet. ## Binary Format From e349bdb601d1b14dfb562a27150e31f19257e026 Mon Sep 17 00:00:00 2001 From: Joachim Breitner Date: Wed, 4 Nov 2020 10:15:33 +0100 Subject: [PATCH 11/11] Apply suggestions from code review Co-authored-by: Yan Chen <48968912+chenyan-dfinity@users.noreply.github.com> --- spec/Candid.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/Candid.md b/spec/Candid.md index 06afada50..e54267556 100644 --- a/spec/Candid.md +++ b/spec/Candid.md @@ -876,7 +876,7 @@ service { : ; ;* } <: service { : ; This subtyping is implemented during the deserialization of Candid at an expected type. As described in [Section Deserialisation](#deserialization), the binary value is conceptually first _decoded_ into an abstract value, and then _coerced_ to the expected type. -This section describes the covercion, as a ternary relation `V ~> V' : T` to describe when a value `V` can be coerced to a value `V'` of type `T`. The fields `V` and `T` can be understood as inputs and `V'` as the output of this relation. +This section describes the coercion, as a ternary relation `V ~> V' : T` to describe when a value `V` can be coerced to a value `V'` of type `T`. The fields `V` and `T` can be understood as inputs and `V'` as the output of this relation. Here `V` models untyped values, which form an abstract data model of both the message in transit (`V`), as well as the result of the coercion (`V'`). In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading: @@ -1288,7 +1288,7 @@ Deserialisation at an expected type sequence `(,*)` proceeds by * checking for the magic number `DIDL` * using the inverse of the `T` function to decode the type definitions `(*)` - * using the inverse of the `M` function, indexed by `ts`, to decode the values `(*)` + * using the inverse of the `M` function, indexed by `(*)`, to decode the values `(*)` * use the coercion relation `(,*) ~> (,*) : (,*)` to try to understand the decoded values at the expected type. ### Deserialisation of future types