From 779664cd820a76db53f397a20488540b6438e734 Mon Sep 17 00:00:00 2001
From: Conner Fromknecht <conner@lightning.engineering>
Date: Mon, 29 Apr 2019 19:35:56 -0700
Subject: [PATCH] BOLT01: add TLV spec

---
 .aspell.en.pws  | 11 ++++++
 01-messaging.md | 89 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+)

diff --git a/.aspell.en.pws b/.aspell.en.pws
index b1a1ad307..1a6b5d4ca 100644
--- a/.aspell.en.pws
+++ b/.aspell.en.pws
@@ -330,3 +330,14 @@ zlib
 ZLIB
 APIs
 duplicative
+TLV
+namespace
+verifier
+verifiers
+EOF
+monotonicity
+varint
+optimizations
+structs
+CompactSize
+encodings
diff --git a/01-messaging.md b/01-messaging.md
index 9ff3d55fd..5475b6d68 100644
--- a/01-messaging.md
+++ b/01-messaging.md
@@ -13,6 +13,7 @@ All data fields are unsigned big-endian unless otherwise specified.
 
   * [Connection Handling and Multiplexing](#connection-handling-and-multiplexing)
   * [Lightning Message Format](#lightning-message-format)
+  * [Type-Length-Value Format](#type-length-value-format)
   * [Setup Messages](#setup-messages)
     * [The `init` Message](#the-init-message)
     * [The `error` Message](#the-error-message)
@@ -82,6 +83,94 @@ however, adding a 6-byte padding after the type field was considered
 wasteful: alignment may be achieved by decrypting the message into
 a buffer with 6-bytes of pre-padding.
 
+
+## Type-Length-Value Format
+
+Throughout the protocol, a TLV (Type-Length-Value) format is used to allow for
+the backwards-compatible addition of new fields to existing message types.
+
+A `tlv_record` represents a single field, encoded in the form:
+
+* [`varint`: `type`]
+* [`varint`: `length`]
+* [`length`: `value`]
+
+A `tlv_stream` is a series of (possibly zero) `tlv_record`s, represented as the
+concatenation of the encoded `tlv_record`s. When used to extend existing
+messages, a `tlv_stream` is typically placed after all currently defined fields.
+
+The `type` is a varint encoded using the bitcoin CompactSize format. It
+functions as a message-specific, 64-bit identifier for the `tlv_record`
+determining how the contents of `value` should be decoded.
+
+The `length` is a varint encoded using the bitcoin CompactSize format
+signaling the size of `value` in bytes.
+
+The `value` depends entirely on the `type`, and should be encoded or decoded
+according to the message-specific format determined by `type`.
+
+### Requirements
+
+The sending node:
+ - MUST order `tlv_record`s in a `tlv_stream` by monotonically-increasing `type`.
+ - MUST minimally encode `type` and `length`.
+ - SHOULD NOT use redundant, variable-length encodings in a `tlv_record`.
+
+The receiving node:
+ - if zero bytes remain before parsing a `type`:
+   - MUST stop parsing the `tlv_stream`.
+ - if a `type` or `length` is not minimally encoded:
+   - MUST fail to parse the `tlv_stream`.
+ - if decoded `type`s are not monotonically-increasing:
+   - MUST fail to parse the `tlv_stream`.
+ - if `type` is known:
+   - MUST decode the next `length` bytes using the known encoding for `type`.
+ - otherwise, if `type` is unknown:
+   - if `type` is even:
+     - MUST fail to parse the `tlv_stream`.
+   - otherwise, if `type` is odd:
+     - MUST discard the next `length` bytes.
+
+### Rationale
+
+The primary advantage in using TLV is that a reader is able to ignore new fields
+that it does not understand, since each field carries the exact size of the
+encoded element. Without TLV, even if a node does not wish to use a particular
+field, the node is forced to add parsing logic for that field in order to
+determine the offset of any fields that follow.
+
+The monotonicity constraint ensures that all `type`s are unique and can appear
+at most once. Fields that map to complex objects, e.g. vectors, maps, or
+structs, should do so by defining the encoding such that the object is
+serialized within a single `tlv_record`. The uniqueness constraint, among other
+things, enables the following optimizations:
+ - canonical ordering is defined independent of the encoded `value`s.
+ - canonical ordering can be known at compile-time, rather that being determined
+   dynamically at the time of encoding.
+ - verifying canonical ordering requires less state and is less-expensive.
+ - variable-size fields can reserve their expected size up front, rather than
+   appending elements sequentially and incurring double-and-copy overhead.
+
+The use of a varint for `type` and `length` permits a space savings for small
+`type`s or short `value`s. This potentially leaves more space for application
+data over the wire or in an onion payload.
+
+All `type`s must appear in increasing order to create a canonical encoding of
+the underlying `tlv_record`s. This is crucial when computing signatures over a
+`tlv_stream`, as it ensures verifiers will be able to recompute the same message
+digest as the signer. Note that the canonical ordering over the set of fields
+can be enforced even if the verifier does not understand what the fields
+contain.
+
+Writers should avoid using redundant, variable-length encodings in a
+`tlv_record` since this results in encoding the length twice and complicates
+computing the outer length. As an example, when writing a variable length byte
+array, the `value` should contain only the raw bytes and forgo an additional
+internal length since the `tlv_record` already carries the number of bytes that
+follow. On the other hand, if a `tlv_record` contains multiple, variable-length
+elements then this would not be considered redundant, and is needed to allow the
+receiver to parse individual elements from `value`.
+
 ## Setup Messages
 
 ### The `init` Message