diff --git a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/blockchain-protocol.md b/.migration/learn-hub/how-does-icp-work/blockchain-protocol/blockchain-protocol.md deleted file mode 100644 index 7a1a8e3..0000000 --- a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/blockchain-protocol.md +++ /dev/null @@ -1,34 +0,0 @@ ---- -learn_hub_id: 34206453538964 -learn_hub_url: "https://learn.internetcomputer.org/hc/en-us/articles/34206453538964-Blockchain-Protocol" -learn_hub_title: "Blockchain Protocol" -learn_hub_section: "Blockchain Protocol" -learn_hub_category: "How does ICP work?" -migrated: false ---- - -# Blockchain Protocol - -The Internet Computer is created by the Internet Computer Protocol (ICP), from which its utility token, the ICP token, derives its name. The Internet Computer consists of multiple subnets, with each subnet created by its own instance of a blockchain protocol stack. Each subnet hosts canister smart contracts and executes messages sent to them either by users or other canister smart contracts (which may be hosted on the same or another subnet). Messages on the IC are analogous to transactions on other blockchains. Messages addressed to a canister smart contract are executed by the nodes on the corresponding subnet by running the code of the canister. Canister code execution updates the canister state. In order to keep the state on the subnet nodes on which a canister is hosted in sync, it must be ensured that every node executes the same messages in the same order, i.e., fully deterministically. This is the core of the blockchain-based replicated state machine functionality realizing the Internet Computer. - -Each node on the Internet Computer runs a replica process. The replica process is structured in a layered architecture consisting of the following 4 layers: - - 1. [Peer-to-peer](https://learn.internetcomputer.org/hc/en-us/articles/34207428453140) - 2. [Consensus](https://learn.internetcomputer.org/hc/en-us/articles/34207558615956) - 3. [Message routing](https://learn.internetcomputer.org/hc/en-us/articles/34208241927316) - 4. [Execution](https://learn.internetcomputer.org/hc/en-us/articles/34208985618836) - - - -![4-layer architecture of the Internet Computer](https://csojb-wiaaa-aaaal-qjftq-cai.icp0.io/_astro/core_protocol_layers.Q9HZPKLE_Z1WJp60.webp) - -_4-layer architecture of the Internet Computer_ - -The **peer-to-peer** layer is responsible for accepting messages from users and exchanging messages between nodes in a subnet. The **consensus** layer makes all the nodes on the subnet agree on the messages to be processed, as well as their ordering. The **message routing** layer picks up the finalized blocks from the consensus layer and routes the messages in the blocks to the appropriate canisters. The **execution** layer determinstically executes canister code on the messages received from the messaging layer. - -The upper two layers realize deterministic execution of the block of messages for a round received from the lower two layers, on each node of the subnet. At the beginning of a round, all (honest) nodes hold the same state, representing the replicated state of the subnet, which includes the current state of all canisters hosted on that subnet. By executing the messages of the next block received from consensus in a completely deterministic manner, it is ensured that the state after executing the messages of the block is the same on each node. - -Canister smart contracts can communicate with each other by sending messages, regardless of whether they are hosted on the same or different subnets. The IC core protocol handles both the inter-canister messages sent locally, i.e., on the same subnet, between canisters, as well as inter-canister messages sent across subnets, so called XNet (or _cross-net_) messages. Local inter-canister messages do not need to go through consensus, while XNet inter-canister messages do (making the former more efficient in terms of throughput and incurring less latency). - -To allow nodes to efficiently join a subnet that is running already or to catch up with the current state in case they have been offline for some time, the protocol supports [state synchronization](https://learn.internetcomputer.org/hc/en-us/articles/34471579767572) without processing all messages that have ever been executed. - diff --git a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/consensus.md b/.migration/learn-hub/how-does-icp-work/blockchain-protocol/consensus.md deleted file mode 100644 index c6115c4..0000000 --- a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/consensus.md +++ /dev/null @@ -1,69 +0,0 @@ ---- -learn_hub_id: 34207558615956 -learn_hub_url: "https://learn.internetcomputer.org/hc/en-us/articles/34207558615956-Consensus" -learn_hub_title: "Consensus" -learn_hub_section: "Blockchain Protocol" -learn_hub_category: "How does ICP work?" -migrated: false ---- - -# Consensus - -The consensus protocol allows the nodes to agree on the messages to be processed, as well as their ordering. The nodes in each ICP subnet run their own instance of the consensus protocol, independently of the other subnets. The purpose of the consensus protocol is to output the same block of ordered messages on each node of a subnet in a given round so that each node can make the same state transition when deterministically executing those messages. - -ICP’s consensus protocol is designed to meet the following requirements: low latency (almost instant finality); high throughput; robustness (graceful degradation of latency and throughput in the presence of node or network failures). - -The consensus protocol provides cryptographically guaranteed finality. The option of choosing probabilistic finality – similar to what is done in Bitcoin-like protocols, by considering a block final once a sufficient number of blocks have built on top of it in the blockchain – is not sufficient for ICP for two reasons: (1) probabilistic finality is a very weak notion of finality and (2) probabilistic finality would increase the time to finality drastically. - -The IC consensus protocol achieves all of these goals making only minimal assumptions about the communication network. In particular, safety of the protocol does not depend on any bounds on the time it takes for protocol messages to be delivered – that is, it only assumes an asynchronous network rather than a synchronous network. Indeed, for a decentralized network that is globally distributed, synchrony is simply not a realistic assumption. In order to achieve good latency, the IC consensus protocol requires protocol messages to be delivered in a timely manner to make progress. While it is possible to design consensus protocols that work in a purely asynchronous setting, these protocols generally have very poor latency. However, the correctness of the protocol is always guaranteed, regardless of message delays, so long as less than a third of the nodes in the subnet are faulty. - -![Consensus round yields an ordered sequences of messages](https://csojb-wiaaa-aaaal-qjftq-cai.icp0.io/_astro/consensus_orders_messages.CPiCaIlB_27rmgz.webp) - -The consensus protocol maintains a tree of notarized blocks (with a special origin block at the root). The protocol proceeds in rounds. In each round, at least one notarized block is added to the tree as a child of a notarized block that was added in the previous round. When things go right, there will be only one notarized block added to the tree in that round, and that block will be marked as finalized. Moreover, once a block is marked as finalized in this way, all ancestors of that block in the tree of notarized blocks are implicitly finalized. The protocol guarantees that there is always a unique chain of finalized blocks in the tree of notarized blocks. This chain of finalized blocks is the output of consensus. - -At a high level, a consensus round has the following three phases: - - * Block making: In every round, at least one node, called a block maker, proposes a block by broadcasting it to all nodes in the subnet using P2P. As we will see, when things go right, there is only one block maker, but sometimes there may be several. - * Notarization: For a block to become notarized, at least two thirds of the nodes must validate the node and support its notarization. - * Finalization: For a block to become finalized, at least two thirds of the nodes must support its finalization. As we will see, a node will support the finalization of a block only if it did not support the notarization of any other block, and this simple rule guarantees that if a block is finalized in a given round, then there can be no other notarized block in that round. - - - -Let us next look at the different phases of a consensus round in more detail. - -## Block making - -A block maker is a node that proposes a block for the current round with a reference to a notarized block of the previous round. As explained below, a cryptographic mechanism called a random beacon is used to select one node (chosen at random) as the primary block maker (or leader) for the current round. The primary block maker assembles a block containing the ingress messages (submitted directly to the node or received from other nodes in the subnet via P2P) and XNet messages (sent to this subnet from other subnets). After assembling a block, the primary block maker proposes this block by broadcasting it to all nodes in the subnet using P2P. - -If the network is slow or the primary block maker is faulty, the block proposed by the primary block maker may not get notarized within a reasonable time. In this case, after some delay, and using the same random beacon mechanism, other block makers are chosen to step in and supplant the primary block maker. The protocol logic guarantees that one block eventually gets notarized in the current round. - -The block makers for a round are chosen through a random permutation of the nodes of the subnet based on randomness derived from a random beacon. [Chain-key cryptography](https://learn.internetcomputer.org/hc/en-us/articles/34209486239252) is used to produce unpredictable and unbiasable pseudo-random numbers. Consensus uses these pseudo-random numbers to define a pseudo-random permutation on the nodes of the subnet. This assigns a rank to each node in the subnet. The lowest-rank node in the subnet acts as the primary block maker. As time goes by without producing a notarized block, nodes of increasing rank gradually step in to supplant the (potentially faulty) nodes of lower rank as block maker. - -In the scenario where the primary block maker is not faulty, and protocol messages get delivered in a timely manner, only the primary block maker will propose a block, and this block will quickly become notarized and finalized. - -![Blockmaker constructs a new block and broadcasts it](https://csojb-wiaaa-aaaal-qjftq-cai.icp0.io/_astro/block_maker.Dwr4LMy1_Z2fhEcM.webp) - -## Notarization - -When a node receives a block proposed by a block maker for the round, it validates the block for syntactic correctness. If the block passes this validity check, the node supports the notarization of the block by broadcasting the block and a notarization share for the block to all nodes in the subnet. A notarization share is a signature share computed using the [BLS multi-signature scheme](https://crypto.stanford.edu/~dabo/pubs/papers/BLSmultisig.html). A block becomes notarized when at least two thirds of the nodes in the subnet support its notarization. In this case, the BLS multi-signature shares may be aggregated to form a compact notarization for the block. - -In the case where the block proposed by the primary block maker gets notarized within a certain amount of time, a node will not support the notarization of any other block in that round. Otherwise, a node may eventually support the notarization of blocks proposed by other block makers of higher rank (but if it has already supported the notarization of a block proposed by a block maker of some rank, it will not support the notarization of blocks proposed by block makers of higher rank). - -![Notarization support of increasing-rank block proposals in a round](https://csojb-wiaaa-aaaal-qjftq-cai.icp0.io/_astro/consensus_notarization.CRg0Lh07_Z1zthef.webp) - -## Finalization - -In a given round, the logic of the protocol guarantees that a node will always obtain a notarized block (assuming less than a third of the nodes in the subnet are faulty). Once it obtains a notarized block, the node will not subsequently support the notarization of any other block. Moreover, if the node did not previously support the notarization of any other block, the node will also support the finalization of this block. It supports the finalization of this block by broadcasting a finalization share for the block to all nodes in the subnet. A finalization share is a signature share computed using the BLS multi-signature scheme. A block becomes finalized when at least two thirds of the nodes in the subnet support its finalization. In this case, the BLS multi-signature shares may be aggregated to form a compact finalization for the block. - -## Additional information - -[Blogpost on Consensus on the Internet Computer](https://medium.com/dfinity/achieving-consensus-on-the-internet-computer-ee9fbfbafcbc) - -[Consensus White Paper](https://eprint.iacr.org/2021/632.pdf) - -[Extended Abstract published at PODC’22](https://assets.ctfassets.net/ywqk17d3hsnp/1Gutwfrd1lMgiUBJZGCdUG/d3ea7730aba0a4b793741681463239f5/podc-2022-cr.pdf) - -[10min video ](https://www.youtube.com/watch?v=WoLWJ5dsWyI&list=PLVEhhIklNtB4HjWkLhqNacvBDzA0Wt2H1) - -[20min video](https://www.youtube.com/watch?v=vVLRRYh3JYo) - diff --git a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/execution-layer.md b/.migration/learn-hub/how-does-icp-work/blockchain-protocol/execution-layer.md deleted file mode 100644 index 79541e6..0000000 --- a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/execution-layer.md +++ /dev/null @@ -1,75 +0,0 @@ ---- -learn_hub_id: 34208985618836 -learn_hub_url: "https://learn.internetcomputer.org/hc/en-us/articles/34208985618836-Execution-Layer" -learn_hub_title: "Execution Layer" -learn_hub_section: "Blockchain Protocol" -learn_hub_category: "How does ICP work?" -migrated: false ---- - -# Execution Layer - -The execution layer, the topmost layer of the Internet Computer (IC) core protocol stack, is responsible for executing canister smart contract code. Code execution is done by a [WebAssembly](https://webassembly.org/) (Wasm) virtual machine deployed on every node. WebAssembly bytecode can be executed deterministically, which is important for a blockchain system, and with near-native speed. Canister messages, i.e., ingress messages by users or messages by other canisters, have been inducted into the queues of the canisters on the subnet by message routing. Message routing then hands over control to the execution layer, which deterministically executes messages, either until all messages in the canisters’ queues are consumed or the cycles limit for the round has been reached, to ensure bounded round times. - -The execution layer has many unique features, which set apart the IC from other blockchains: - - 1. **Deterministic Time Slicing (DTS):** The execution of very large messages requiring billions of Wasm instructions to be executed can be split across multiple IC rounds. This capability of executing messages over multiple rounds is unique to ICP. - 2. **Concurrency:** Execution of canister Wasm bytecode is done concurrently on multiple CPU cores, which is possible due to each canister having its own isolated state. - 3. **Pseudorandom number generator:** The execution layer has access to an unpredictable and unbiasable pseudorandom number generator. Canisters can now execute algorithms that require randomness. - - - -## Replicated message execution - -Replicated execution proceeds in rounds. In each round, the message routing layer invokes the execution layer once for executing (a subset of) the messages in the canister input queues. Depending on how much effort (CPU cycles) the execution of the messages of a round requires, a round ends with all messages in the queues being executed or the cycles limit of the round being reached and parts of the messages left to future rounds for execution. - -Each message execution can lead to memory pages of the canister’s state being modified (becoming “dirty” in operating systems terminology), new messages to other canisters on the same or different subnets being created, or a response to be generated in case of an ingress message. Changes to memory pages are tracked and corresponding pages flagged as “dirty” so that they can be processed when certifying the state. - -When a message execution leads to the generation of a new canister message targeted at a canister in the local subnet, this message can be queued up directly by execution in the input queue of the target canister and scheduled in the same round or an upcoming round. This message does not need to go through consensus since the generation and enqueuing of the new message is completely deterministic and thus happens in exactly the same way on all the nodes of the subnet. - -New messages targeted at other subnets are placed into the target cross-subnet queue (XNet queue) and are certified by the subnet at the end of the round as part of the per-round state certification. The receiving subnet can verify that the XNet messages are authenticated by the subnet by validating the signature with the originating subnet’s public key. - -The execution layer is designed at its core to execute multiple canisters concurrently on different CPU cores. This is possible because each canister has its own isolated state and canister communication is asynchronous. This form of concurrent execution within a subnet together with the capability of all ICP subnets executing canisters concurrently makes ICP scalable like a public cloud: ICP scales out by adding more subnets. - -## Deterministic time slicing - -Each execution round progresses alongside the creation of blockchain blocks, which happens roughly once every second. This restricts how much computation can be performed in a single round, with the current limit being around 2 billion instructions given the existing node hardware. - -However, the Internet Computer can handle longer tasks that need up to 20 billion instructions, and some special tasks, like code installation, can even go up to 200 billion instructions. This is achieved using _Deterministic Time Slicing_ (DTS). The idea is to pause a lengthy task at the end of one round and continue it in the next. As a result, a task can span multiple rounds without slowing down the block creation rate. DTS is automatic and transparent to smart contracts, so developers don’t need to write any special code to use it. - -## Memory handling - -Management of the canister bytecode and state (collectively memory) is one of the key responsibilities of the execution layer. The replicated state that can be held by a single subnet is not bounded by the available RAM in the node machines, but rather by the available SSD storage. Available RAM, however, impacts the performance of the subnet, particularly the access latency of memory pages. This depends a lot on the access patterns of the workload, however, much like in traditional computer systems. - -The node machines that comprise the IC are equipped with tens of terabytes of high-end SSD storage and over half a terabyte of RAM to be able to hold large amounts of replicated canister state and Wasm code and achieve good performance when accessing memory. The states obtained while executing canisters are certified (i.e., digitally signed) by the state management component of message routing. Certification of some parts of the states, including the ingress history and the messages that are sent to other subnetworks, are certified every round. The entire state of a subnetwork, including the state of all canisters hosted by that subnetwork, is certified once every (much longer) checkpointing interval. - -Memory pages representing canister state are persisted to SSD by the execution layer, without canister programmers needing to take care of this. This _orthogonal persistence_ frees the smart contract programmers from reading from and writing to storage explicitly as on other blockchains or as in traditional IT systems. This dramatically simplifies smart contract implementation and helps reduce the TCO of a dapp and go to market faster. Programmers can always have the full canister smart contract state on the heap or in stable memory. The difference between heap and stable memory is that the heap is cleared on updates of the canister code, while stable memory remains stable throughout updates, hence its name. Any state on the heap that is to be preserved through a canister update must be transferred to stable memory by a canister programmer before an update and restored from there after the update. Best practices are that large canister state be held directly in stable memory to avoid shuffling around large amounts of storage before and after each upgrade. This also avoids the risk of exceeding the cycles limit allowed in an upgrade operation. - -## Random number generation - -Many applications benefit from, or require, a secure random number generator. Yet, generating random numbers in the naïve way as part of execution trivially destroys determinism as every node would compute different randomness. ICP solves this problem by the execution layer having access to a decentralized pseudorandom number generator called the _random tape_. The random tape is built using chain-key cryptography. Every round, the subnetwork produces a fresh threshold BLS signature which, by its very nature, is unpredictable and uniformly distributed. This signature can then be used as seed in a cryptographic pseudorandom generator. This gives canister smart contracts access to a highly-efficient and secure random number source, which is another unique feature of ICP. - -## Cycles accounting - -The execution of a canister consumes resources of the Internet Computer, which are paid for with cycles. Each canister holds a local cycles account. Ensuring that the account holds sufficient cycles is the responsibility of its maintainer, which can be a developer, a group of developers or a decentralized autonomous organization (DAO) – users do never pay for sending messages to canisters on the IC. This resource charging model is known as the _reverse gas model_ and is a facilitator for mass adoption of the IC. - -Technically, the Wasm code running in a canister gets instrumented, when the Wasm bytecode is installed or updated on the IC, with code that counts the executed instructions for smart contract messages. This allows for deterministically computing the exact amount of cycles to be charged for a given message being executed. Using Wasm as bytecode format for canisters has helped greatly to reach determinism as Wasm itself is a format that is largely deterministic in its execution. It is crucial that the cycles charging be completely deterministic so that every node charges exactly the same amount of cycles for a given operation and that the replicated state machine properties of the subnet are maintained. - -The memory the canister uses in terms of both its Wasm code and canister state needs to be paid for with cycles as well. Much like in the public cloud, consumed storage is charged for per time unit. Compared to other blockchains, it is very inexpensive to store data on the IC. Furthermore, networking activities such as receiving ingress messages, sending XNet messages, and making HTTPS Outcalls to Web 2.0 servers are paid for in cycles by the canister. Prices for a given resource, e.g., executing Wasm instructions, scale with the replication factor of the subnet, i.e., the number of nodes that power the subnet. - -## Non-replicated message execution - -Non-replicated message execution, aka queries, are operations executed by a single node and return a response synchronously, much like a regular function invocation in an imperative programming language. The key difference to messages, which are also called update calls, is that queries cannot change the replicated state of the subnet, while update calls can. Queries are, as the name suggests, essentially read operations performed on one replica of the subnet, with the associated trust model of a compromised replica being able to return any arbitrary result of its choice. - -Analogous to update calls, queries are executed concurrently by multiple threads on a node. - -However, all the nodes of the subnet can concurrently execute different queries because queries are not executed in a replicated way. Query throughput of a subnet thus increases linearly with an increasing number of nodes in the subnet, while update call performance does not. - -Queries by themselves are similar to read operations on a local or cloud Ethereum node on the Ethereum blockchain. The response of any individual node should not be trusted. Whenever an information item to be read is critical, e.g., financial data based on which decisions are made, applications can either use update calls to obtain such information (as the response of an update call is certified by the subnet) or [certified variables](https://learn.internetcomputer.org/hc/en-us/articles/34214090576404), as both are verifiable with the subnet’s public key. - -## Additional Information - -[Usenix ATC article on execution environment](https://www.usenix.org/system/files/atc23-arutyunyan.pdf) - -[16 min video](https://www.youtube.com/watch?v=UHA7W-8My_I&list=PLuhDt1vhGcrfHG_rnRKsqZO1jL_Pd970h&index=16) - diff --git a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/message-routing.md b/.migration/learn-hub/how-does-icp-work/blockchain-protocol/message-routing.md deleted file mode 100644 index b9932d5..0000000 --- a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/message-routing.md +++ /dev/null @@ -1,85 +0,0 @@ ---- -learn_hub_id: 34208241927316 -learn_hub_url: "https://learn.internetcomputer.org/hc/en-us/articles/34208241927316-Message-Routing" -learn_hub_title: "Message Routing" -learn_hub_section: "Blockchain Protocol" -learn_hub_category: "How does ICP work?" -migrated: false ---- - -# Message Routing - -On the Internet Computer, users can interact with canister smart contracts by sending them messages, and canisters themselves can exchange messages with each other. For scalability, the Internet Computer is composed of many subnets and the [Network Nervous System](https://learn.internetcomputer.org/hc/en-us/articles/33692645961236) can add new subnets as required. The message routing component routes messages to and from canisters across all of the Internet Computer’s subnet blockchains and ensures that new subnets can be added seamlessly. - -Message routing is the lower of the two upper layers of the protocol stack. It implements functionality that is crucial for the operation of the IC. Its responsibilities can be roughly grouped as follows: - - * Induction of messages received in blocks from consensus; - * Invocation of the execution layer after successful induction; - * Routing of inter-canister messages resulting from execution within and between subnets; - * Certification of the state of the subnet. - - - -Although the layer derives its name from the functionality of routing messages, all the functionality listed above is equally important for the IC. Particularly, state certification is heavily used in chain-evolution technology to enable resumption of nodes. - -## Message processing - -Whenever consensus produces a finalized block of messages, that is, a block that has been considered correct (notarized) and finalized by at least two thirds of the subnet’s nodes, this block is handed over to message routing. This marks the transition between the lower and upper half of the protocol stack: The lower two layers are responsible for agreeing, in each round, among all nodes in the subnet on a block of messages to be executed. This block is then handed over to the upper layers for deterministic processing, which, more concretely, means passing it over to message routing which takes over the further orchestration of deterministic processing. - -Once message routing receives a block of messages – comprising both ingress messages submitted by users and XNet messages sent by canisters – the messages are extracted from the block and each message is placed into the input queue of its target canister. This process is called induction and all the queues are collectively referred to as induction pool. After induction, the execution layer – the topmost layer of the core IC protocol stack – is triggered to deterministically schedule messages in the induction pool for execution and to execute them. Message routing and execution modify the subnet state in a deterministic way, i.e., the state of the node is changed in the same way on every (honest) node of the subnet, which is crucial for achieving the replicated state machine properties of a subnet. The execution of a message can write to memory pages of the canister the message is executed on and change other metadata in the state. The execution of a message can also lead to the creation of new messages targeted at other canisters. Such a message can be either targeted at a canister on the local subnet or another subnet. In the former case, execution can directly place the new message into the input queue of the target canister. In the latter case, i.e., a new message that is targeted at another subnet, the message is placed into the so-called XNet stream for the target subnet where they can be picked up by block makers of the target subnets after the streams are certified. - -## Inter-canister messaging - -The execution of a canister message can lead to the creation of a new inter-canister message sent to a canister that can be either _local_ or _remote_ (on a different subnet). - -### Intra-subnet inter-canister messaging - -Intra-subnet, i.e., local, inter-canister messages originating from an executing canister method do not need to go through consensus as they deterministically result from messages that have been agreed by a previous consensus round and their further execution remains completely deterministic. This holds transitively, that is, inter-canister messages can create new inter-canister messages, resulting in a tree of messages. Local message invocations can be executed as long as the cycles limit for the round has not yet been exhausted. If the cycles limit is exhausted but there are still local messages left, they will be handled in the same way as intra-subnet messages. It is important to note that this local canister-to-canister messaging is not synchronous message invocation as one might be used to from EVM-based blockchains. Rather, local messages are put into the input queue of the target canister and are scheduled for execution asynchronously. This is the standard inter-canister messaging semantics known for the Internet Computer. - -### Inter-subnet inter-canister messaging - -Remote inter-canister messages, that is, messages sent to canisters on other subnets, are handled by routing them into the respective outgoing subnet stream for the target subnet. This routing happens at the end of the deterministic execution cycle, i.e., after execution hands back control to message routing. The XNet messages in the stream are certified (signed) using a Merkle-tree-style data representation at the end of the round by the subnet using [chain-key cryptography](https://learn.internetcomputer.org/hc/en-us/articles/34209486239252) as part of the per-round state certification. That is, every message in the outgoing stream is certified by the originating subnet. Replicas on the receiving subnet obtain the XNet messages during block making (part of consensus), validate the certificate, and include valid XNet messages in a consensus block. Thanks to using a Merkle-tree-like datastructure to encode and authenticate the XNet streams, parts of the streams can be consumed in a round by the receiving subnets and signatures can still be validated. - -## State certification - -The replicated state of a subnet comprises all the relevant information required for the operation of the subnet: - - * Items certified per round: - * * Responses to ingress messages - * Xnet messages to be sent to other subnets - * Canister metadata (module hashes, certified variables) - * Items certified per checkpoint: - * * The entire replicated state - - - -Certification is always done using chain-key cryptography, thus certifications are computed by the subnet as a whole in a decentralized manner. Such a certification can only exist if the majority of the subnet agrees on the state. - -State certification and secure XNet messaging enable, among others, the secure and transparent communication of canisters across subnet boundaries, a challenge that any blockchain that has multiple shards has to solve. It also provides crucial building blocks to allow users to read certified parts of the replicated state, e.g., responses to messages submitted by them. Furthermore, it allows nodes to join a subnet efficiently without replaying all blocks since genesis or fallen behind nodes to catch up to the most recent state of a subnet. All of this makes message routing an integral layer of the core IC protocol crucial for realizing some of the IC’s unique and distinguishing features. - -### Per-round certification - -At the end of a round, i.e., when all messages have been executed or the cycles limit for the round has been reached (to ensure rounds cannot take arbitrarily long), the message routing layer performs a certification of parts of the replicated state. The certificate covers the part of the state tree containing - - * Responses to ingress messages, - * Xnet messages to be sent to other subnets, and - * Canister metadata (module hashes, certified variables). - - - -The responses to ingress messages are often referred to as ingress history. The certified responses can be read and validated against the subnet’s public key by users as the response to their ingress messages. Each of the public keys of the individual subnets are, in turn, certified by the NNS using the same mechanism. This means that one can verify that certified responses indeed come from the IC only using the public key of the NNS. This way of validating responses to state-changing messages to a blockchain is extremely powerful when compared to other approaches seen in the field like reading the response from a transaction log. - -The per-round state certification ensures that any item of data relevant for interactions of users and subnets and between different subnets on the Internet Computer is authenticated. This particularly enables secure and verifiable inter-subnet communication, a crucial feature of the Internet Computer as well as an enabler of its scalability. - -### Per-checkpoint certification - -Wasm code changed through canister updates and written-to (“dirty”) memory pages of canisters and some other metadata in the replicated state do not get certified in every round. Instead they are only certified whenever a so-called checkpoint is created. A checkpoint is a copy of the replicated state that is persisted to disk. Such a checkpoint is written every multiple hundred rounds (or around 10 minutes), and for each checkpoint the subnet also computes a certification. This allows newly joining and fallen behind nodes to join in without re-executing all blocks. The state certification is done incrementally by incorporating the changes since the last checkpoint certification into the manifest of the previous checkpoint. The manifest can abstractly be viewed as a relatively flat Merkle tree and the incremental computation can be achieved by updating the leaves that have changed and propagating changes up the tree. Finally, the root hash of the manifest is signed by the subnet, thereby certifying the entire contents of the manifest. The signed result is called a catch-up package as it can be used by nodes to efficiently catch up to the point in time when the checkpoint was made. (Note that a catch-up package also contains other things required to resume, which are omitted here for the sake of simplicity.) The run time of this certification operation is linear in the number of memory pages that have changed and not the overall state size on the subnet. This is crucial as a subnet can hold terabytes of state in the future and a full recertification of multiple terabytes of replicated state would not be practical at every checkpoint interval. - -## Additional information - -[Wiki page describing the message routing layer in more detail](https://wiki.internetcomputer.org/wiki/IC_message_routing_layer) - -[8min video on Message Routing and Execution Layer](https://www.youtube.com/watch?v=dS3ny6ik1pA) - -[30min video on Message Routing](https://www.youtube.com/watch?v=YexfeByBXlo) - diff --git a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/peer-to-peer.md b/.migration/learn-hub/how-does-icp-work/blockchain-protocol/peer-to-peer.md deleted file mode 100644 index 6ab0b3e..0000000 --- a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/peer-to-peer.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -learn_hub_id: 34207428453140 -learn_hub_url: "https://learn.internetcomputer.org/hc/en-us/articles/34207428453140-Peer-to-peer" -learn_hub_title: "Peer to peer" -learn_hub_section: "Blockchain Protocol" -learn_hub_category: "How does ICP work?" -migrated: false ---- - -# Peer to peer - -The peer-to-peer layer (P2P) of the Internet Computer, the bottommost layer in the protocol stack, is responsible for the secure and reliable communication between the nodes of a subnet. P2P thus serves as the foundation of the Internet Computer’s protocol stack by enabling nodes to broadcast artifacts, such as user inputs to canisters or protocol messages like block proposals. P2P's key property is the guaranteed message delivery to all required subnet nodes despite varying real-world network conditions and node failures. The P2P layer is used by the [consensus layer](hc/en-us/articles/01JJBWMDX90WR1GE5HHD0EAEPB), the next layer in the stack above it, to broadcast artifacts to the other nodes in the subnet. - -## Abortable broadcast - -At the heart of the P2P layer is the [Abortable Broadcast primitive](https://arxiv.org/abs/2410.22080), which is critical for efficient communication in a setting where nodes may fail or even act maliciously. With Abortable Broadcast, nodes abort the transmission of artifacts they no longer need explicitly. This allows Abortable Broadcast to provide strong delivery guarantees in the presence of network congestion, node or link failures, and backpressure. By preserving bandwidth and bounding the size of its data structures, Abortable Broadcast prevents overload from malicious nodes while ensuring the delivery of non-aborted artifacts from honest nodes. It resembles a publish–subscribe model, with the added ability to abort in-flight messages when needed. - -The P2P layer allows the filtering of incoming artifacts, accepting only necessary ones while discarding or delaying the admission of others. This ensures crucial artifacts are obtained more quickly than the others. This optimization is well-known from traditional networking and reduces the processing load of the layers above P2P. - -## QUIC Transport - -The Abortable Broadcast implementation relies on a transport component consisting of a custom RPC library built on top of [QUIC](https://en.wikipedia.org/wiki/QUIC). This library enables the efficient orchestration of multiple higher-level protocols on the same replica. Key features of the transport component include message multiplexing and caller pushback in the event that packet consumption is significantly slower than packet production. - -## Security - -To prevent Denial of Service (DoS) attacks, nodes connect only with other nodes in the same subnet, with membership managed by the [Network Nervous System (NNS)](https://learn.internetcomputer.org/hc/en-us/articles/01JH3CFANJAE1J5VAZ9NZ3ZQ9Z). The NNS registry canister acts as a service discovery mechanism for the P2P layer, enabling P2P to ensure encrypted and authenticated communication between nodes through TLS. - - - -## Additional information - -[Blogpost on P2P](https://medium.com/dfinity/a-new-p2p-layer-is-coming-to-the-internet-computer-772ac2a29484) -[Scientific article on Abortable Broadcast and its implementation for ICP](https://arxiv.org/abs/2410.22080) -[Video on Abortable Broadcast](https://www.youtube.com/watch?v=f8-G_C4li70&list=PLVEhhIklNtB4HjWkLhqNacvBDzA0Wt2H1) - diff --git a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/state-synchronization.md b/.migration/learn-hub/how-does-icp-work/blockchain-protocol/state-synchronization.md deleted file mode 100644 index a5e26c7..0000000 --- a/.migration/learn-hub/how-does-icp-work/blockchain-protocol/state-synchronization.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -learn_hub_id: 34471579767572 -learn_hub_url: "https://learn.internetcomputer.org/hc/en-us/articles/34471579767572-State-Synchronization" -learn_hub_title: "State Synchronization" -learn_hub_section: "Blockchain Protocol" -learn_hub_category: "How does ICP work?" -migrated: false ---- - -# State Synchronization - -To allow nodes to efficiently join a subnet that is running already or to catch up with the current state in case they have been offline for some time, the protocol supports state synchronization without processing all messages that have ever been executed. - -To this end, the protocol creates checkpoints of the entire subnet state periodically. The checkpoints are [certified](https://learn.internetcomputer.org/hc/en-us/articles/34208241927316) by the subnet through a signature on a Merkle-tree-like structure – the manifest – and made available as part of a catch-up package via the [Peer-to-Peer (P2P) layer](https://learn.internetcomputer.org/hc/en-us/articles/34207428453140). As the name already suggests, a catch-up package allows a node to catch up if it has fallen behind, e.g., because it was offline for some time. In addition, it allows new nodes to join, e.g., if the subnet is to grow in size or a node needs to be replaced because of a failure. - -## Nodes that join the subnet - -A new node can download the latest catch-up package and, after validating it, download the state corresponding to the checkpoint. Downloading the state requires the transfer of large amounts (gigabytes to terabytes) of data from the nodes’s peers. This is done efficiently and in parallel from all peers, by using a protocol that chunks the state and allows for different chunks to be downloaded from different peers. Every chunk is authenticated through the catch-up package individually through its hash. The tree-like structure of the manifest allows to verify each of these chunks individually relative to the root hash in the catch-up package. The chunking protocol is similar to the approach that Bittorrent uses for downloading large files from many peers. - -Once the full state corresponding to the checkpoint has been authentically downloaded, the node catches up to the current block height by processing all the blocks that have been generated in the subnet since the checkpoint. - -Without state synchronization, it becomes practically impossible for nodes to (re-)join in a busy subnet: they would need to replay all blocks from the very first block ever created on the subnet as it is done in other blockchains. Thanks to the state sync protocol allowing to download recent checkpoints, only few blocks need to be replayed as opposed to replaying every block from the start of the blockchain. This is important is that the IC is intended to have a high throughput of compute operations per time unit, much like cloud servers running their applications. Consider a subnet that has been running for multiple years with high CPU utilization. This would make it infeasible for a newly joining node to catch up with the subnet when trying to replay all blocks starting with the genesis block of the subnet as it would have to redo multiple CPU years worth of computation. Thus, state synchronization is a necessary feature for a blockchain that wants to operate successfully under real-world conditions where nodes do fail and need replacement. - -## Nodes that are behind - -If a node is not newly added, but only had a downtime or other performance degradation and needs to catch up, it may still have an older checkpoint. In this case, only the chunks different to the local checkpoint need to be downloaded, which can significantly reduce the volume of data transferred. - -The blockchain state is organized as a Merkle tree and can currently reach a size of up to a terabyte. The syncing node might already have most of the blockchain state and may not need to download everything. Therefore, the syncing node tries to download only the subtrees of the peers’ blockchain state that differ from its local state. The syncing node first requests for the children of the root of the blockchain state. The syncing node then recursively downloads the subtrees that differ from its local state. - -![The catching-up replica only syncs the parts of the replicated state that differ from the up-to-date replica](https://csojb-wiaaa-aaaal-qjftq-cai.icp0.io/_astro/state-sync.CGBHsPNA_Z1fxTja.webp) - -### Additional Information - -[20min video on State Synchronization](https://www.youtube.com/watch?v=WaNJINjGleg&list=PLuhDt1vhGcrfHG_rnRKsqZO1jL_Pd970h&index=14&t=2s) - diff --git a/docs/concepts/network-overview.md b/docs/concepts/network-overview.md index 1100c29..20c1523 100644 --- a/docs/concepts/network-overview.md +++ b/docs/concepts/network-overview.md @@ -45,7 +45,7 @@ This produces one block per round (approximately every 1 second). Update calls a Query calls skip consensus entirely: a single node handles the request and returns its local state, which is why queries are fast (milliseconds) but provide weaker authenticity guarantees than update calls. -For a deeper dive into the consensus protocol and other protocol internals, see the [Learn Hub](https://learn.internetcomputer.org). +For a deeper dive into the consensus protocol and other protocol internals, see [Protocol Stack](protocol/index.md). ## Boundary nodes diff --git a/docs/concepts/protocol/consensus.md b/docs/concepts/protocol/consensus.md new file mode 100644 index 0000000..8dc1845 --- /dev/null +++ b/docs/concepts/protocol/consensus.md @@ -0,0 +1,53 @@ +--- +title: "Consensus" +description: "How ICP subnets reach agreement on message ordering through block making, notarization, and finalization." +--- + +The consensus protocol allows every node in a subnet to agree on which messages to process and in what order. Each subnet runs its own independent instance of the protocol. The output of each consensus round is a single finalized block of ordered messages that every node then executes deterministically, producing the same state transition on each. + +ICP's consensus is designed to meet three requirements: + +- **Low latency.** Blocks are finalized in roughly one second, achieving near-instant finality. +- **High throughput.** Many messages can be included in each block. +- **Robustness.** The protocol degrades gracefully under node or network failures, maintaining safety regardless of message delivery timing. + +## Cryptographic finality + +ICP provides cryptographic finality rather than probabilistic finality. Probabilistic finality considers a block final only after enough subsequent blocks have built on top of it. ICP avoids this approach for two reasons: probabilistic finality is a weak guarantee, and it would substantially increase the time before a message response can be trusted. + +The ICP consensus protocol achieves cryptographic finality while making minimal assumptions about the network. Safety does not depend on any bound on message delivery time (the protocol only assumes an asynchronous network). For a globally distributed network, synchrony is not a realistic assumption. When messages do arrive promptly, the protocol makes progress with good latency. Correctness is always guaranteed regardless of message delays, as long as fewer than one third of subnet nodes are faulty. + +## Consensus rounds + +The protocol maintains a tree of notarized blocks, with a special genesis block at the root. The protocol proceeds in rounds. Each round adds at least one new notarized block to the tree as a child of a notarized block from the previous round. When things proceed normally, exactly one notarized block is added and it is immediately finalized. Once a block is finalized, all of its ancestors are implicitly finalized. The protocol guarantees a unique chain of finalized blocks. This chain is the output of consensus. + +A consensus round has three phases. + +### Block making + +In every round, one or more nodes called [block makers](../../references/glossary.md#block-maker) propose a block. Each block contains a reference to a notarized block from the previous round, ingress messages submitted by users, and XNet messages received from other subnets. + +Block makers are selected through a random permutation of subnet nodes, using randomness derived from a [random beacon](../../references/glossary.md#random-beacon) produced by [chain-key cryptography](../chain-key-cryptography.md). The permutation assigns a rank to each node. The lowest-rank node acts as the primary block maker and broadcasts its proposal to all subnet nodes. + +If the primary block maker is faulty or the network is slow and no notarized block appears within a timeout, nodes of increasing rank step in to propose blocks. The protocol guarantees that one block eventually gets notarized in every round. + +### Notarization + +When a node receives a block proposal, it validates it for syntactic correctness. If valid, the node broadcasts the block along with a notarization share: a BLS multi-signature share. A block becomes notarized when at least two thirds of subnet nodes have submitted notarization shares for it. These shares can be aggregated into a compact notarization. + +If the primary block maker's proposal is notarized within the timeout, a node will not support the notarization of any other block in that round. Otherwise, a node may support notarization of blocks from higher-rank block makers (but only up to the highest rank it has already committed to). + +### Finalization + +Once a node obtains a notarized block, it will not subsequently support notarization of any other block in that round. If the node had not previously supported notarization of any other block, it also broadcasts a finalization share for this block. A block is finalized when at least two thirds of nodes have submitted finalization shares. + +This rule guarantees that if a block is finalized in a given round, no other notarized block exists in that round: the chain remains unique. + +## Further reading + +- [Protocol Stack](index.md): how consensus fits into the four-layer architecture +- [DFINITY Consensus blog post](https://medium.com/dfinity/achieving-consensus-on-the-internet-computer-ee9fbfbafcbc) +- [Consensus white paper](https://eprint.iacr.org/2021/632.pdf) +- [Extended abstract published at PODC '22](https://assets.ctfassets.net/ywqk17d3hsnp/1Gutwfrd1lMgiUBJZGCdUG/d3ea7730aba0a4b793741681463239f5/podc-2022-cr.pdf) + + diff --git a/docs/concepts/protocol/execution.md b/docs/concepts/protocol/execution.md new file mode 100644 index 0000000..c32d9e8 --- /dev/null +++ b/docs/concepts/protocol/execution.md @@ -0,0 +1,71 @@ +--- +title: "Execution Layer" +description: "How ICP deterministically executes canister code using WebAssembly, deterministic time slicing, and concurrent execution." +--- + +The execution layer is the topmost layer of the ICP core protocol stack. It is responsible for executing canister code after message routing has inducted messages into canister input queues. Code runs in a [WebAssembly](https://webassembly.org/) (Wasm) virtual machine deployed on every subnet node. Wasm bytecode executes deterministically and at near-native speed, both of which are essential properties for a replicated system. + +Execution proceeds deterministically: every honest node on the subnet executes the same messages in the same order and reaches the same resulting state. + +## Replicated execution + +Execution proceeds in rounds. Each round, message routing invokes the execution layer once to process (a subset of) the messages in canister input queues. A round ends either when all queued messages have been executed or when the cycles limit for the round is reached, ensuring bounded round times. + +Executing a message can: + +- Modify memory pages in the canister's state (marking them "dirty") +- Create new messages to other canisters on the same or different subnets +- Generate a response to an ingress message + +Messages to local canisters are queued directly in the target canister's input queue and scheduled for the same or an upcoming round, without going through consensus. Messages to canisters on other subnets are placed into the XNet queue and certified by the subnet at the end of the round. + +## Concurrent execution + +The execution layer is designed to execute multiple canisters concurrently on different CPU cores. This is possible because each canister has its own isolated state and inter-canister communication is asynchronous. Concurrent execution within a subnet, combined with multiple subnets running in parallel, makes ICP scale like a public cloud: by adding more subnets. + +## Deterministic time slicing + +Each execution round is synchronized with block production, which happens roughly once per second. The current per-round instruction limit is approximately 2 billion instructions given present node hardware. + +For longer computations (up to 20 billion instructions, or up to 200 billion for code installation), ICP uses **Deterministic Time Slicing (DTS)**. DTS pauses a long-running computation at the end of a round and resumes it in the next, allowing a task to span multiple rounds without slowing block creation. DTS is automatic and transparent to canisters: no special canister code is needed. + +## Memory handling + +One of the execution layer's key responsibilities is managing canister bytecode and state (collectively: canister memory). The replicated state a subnet can hold is bounded by available SSD storage, not RAM. Available RAM affects performance through access latency, much as it does in traditional systems. + +ICP node machines are equipped with high-end SSD storage and substantial RAM to hold large amounts of replicated canister state and Wasm code. + +Memory pages representing canister state are persisted to SSD automatically by the execution layer. This [**orthogonal persistence**](../orthogonal-persistence.md) frees developers from explicitly managing reads and writes to storage. The full canister state is always available on the heap or in stable memory: + +- **Heap memory** is cleared when canister code is upgraded. State intended to survive upgrades must be moved to stable memory before the upgrade and restored afterward. +- **Stable memory** persists across code upgrades. Large state should be kept in stable memory directly to avoid the cost and risk of copying it back and forth at upgrade time. + +## Random number generation + +Many applications require a secure source of randomness. Generating random numbers naively in a replicated setting destroys determinism, since each node would produce different values. ICP solves this with the **random tape**: a decentralized pseudorandom number generator built using chain-key cryptography. + +Each round, the subnet produces a fresh threshold BLS signature. This signature is unpredictable and uniformly distributed by its nature. It is used as a seed for a cryptographic pseudorandom generator, giving canisters access to a secure, efficient, and verifiable source of randomness. + +## Cycles accounting + +Executing a canister consumes network resources. These resources are paid for with [**cycles**](../../references/glossary.md#cycle). Each canister holds a local cycles account. The canister itself pays for its own storage and computation: users never send cycles with their messages. Ensuring the cycles account is funded is the responsibility of the canister's maintainer (a developer, a team, or a community-governed application). + +When canister Wasm code is installed or upgraded, it is instrumented with instruction-counting code. This allows the exact number of cycles to be charged for each message execution in a fully deterministic way, so every node charges the same amount and replicated state machine properties are preserved. + +Cycles are also charged for: + +- **Storage.** Both Wasm code and canister state are charged per unit of time, similar to cloud storage billing. Prices scale with the subnet's replication factor. +- **Networking.** Receiving ingress messages, sending XNet messages, and making HTTPS outcalls are all charged in cycles. + +## Query execution + +[Query calls](../../references/glossary.md#query) (non-replicated execution) are executed by a single node and return a response synchronously. Unlike update calls, queries cannot change the replicated state of the subnet: they are read operations on one replica. Queries execute concurrently across multiple threads on a single node, and all nodes in the subnet can serve different queries concurrently, so query throughput scales linearly with subnet size. + +The tradeoff is the trust model: a single node executes the query, so a compromised node could return an arbitrary result. For critical data, use update calls (which produce responses certified by the subnet) or [certified variables](../../guides/backends/certified-variables.md). + +## Further reading + +- [Protocol Stack](index.md): how execution fits into the four-layer architecture +- [Usenix ATC paper on the ICP execution environment](https://www.usenix.org/system/files/atc23-arutyunyan.pdf) + + diff --git a/docs/concepts/protocol/index.md b/docs/concepts/protocol/index.md new file mode 100644 index 0000000..cbd2b94 --- /dev/null +++ b/docs/concepts/protocol/index.md @@ -0,0 +1,42 @@ +--- +title: "Protocol Stack" +description: "The four-layer architecture that every ICP subnet runs: peer-to-peer, consensus, message routing, and execution." +--- + +The Internet Computer is created by the Internet Computer Protocol (ICP), which gives the network its name. ICP consists of multiple subnets, with each subnet running its own instance of the protocol stack. Each subnet hosts canisters and executes messages sent to them by users or by other canisters (which may be hosted on the same or a different subnet). + +A message addressed to a canister is executed by every node in the corresponding subnet. Execution updates the canister state. To keep state in sync across all nodes, every node must execute the same messages in the same order: fully deterministically. This replicated state machine property is the core of what makes ICP a trustworthy execution environment. + +## Four-layer architecture + +Each node runs a replica process structured in four layers: + +1. [Peer-to-peer](peer-to-peer.md): secure and reliable message broadcast between nodes +2. [Consensus](consensus.md): agreement on which messages to process and in what order +3. [Message routing](message-routing.md): delivery of messages to canister input queues and state certification +4. [Execution](execution.md): deterministic execution of canister code + +The lower two layers (peer-to-peer and consensus) are responsible for agreeing, each round, on a block of messages. The upper two layers (message routing and execution) deterministically process that block on every node. + +At the start of a round, all honest nodes hold identical state: the replicated state of the subnet, which includes the current state of every canister hosted there. By executing the messages in a finalized block in a completely deterministic way, every node reaches the same resulting state. + +## Cross-subnet messaging + +Canisters communicate with each other regardless of whether they share a subnet. The protocol handles both: + +- **Intra-subnet messages.** Messages between canisters on the same subnet do not go through consensus. They are placed directly into the target canister's input queue and scheduled for execution. This makes local inter-canister calls faster in terms of latency and throughput. +- **Cross-subnet messages (XNet).** Messages to canisters on other subnets flow through the XNet stream. The originating subnet certifies these messages using [chain-key cryptography](../chain-key-cryptography.md), and block makers on the receiving subnet validate the certificate and include the messages in a block. + +## State synchronization + +To allow nodes to efficiently join a running subnet or catch up after downtime, the protocol supports [state synchronization](state-synchronization.md). Rather than replaying every message ever executed, a new or recovering node downloads a recent certified checkpoint and replays only the blocks produced since that checkpoint. + +## Further reading + +- [Consensus](consensus.md): block making, notarization, and finalization in detail +- [Peer-to-peer](peer-to-peer.md): Abortable Broadcast and QUIC transport +- [Message routing](message-routing.md): induction, XNet streaming, and state certification +- [Execution](execution.md): WebAssembly execution, deterministic time slicing, and cycles +- [State synchronization](state-synchronization.md): catch-up packages and incremental sync + + diff --git a/docs/concepts/protocol/message-routing.md b/docs/concepts/protocol/message-routing.md new file mode 100644 index 0000000..64f5177 --- /dev/null +++ b/docs/concepts/protocol/message-routing.md @@ -0,0 +1,70 @@ +--- +title: "Message Routing" +description: "How ICP routes messages between canisters across subnets, certifies subnet state, and enables secure cross-subnet communication." +--- + +Message routing is the lower of the two upper layers of the ICP protocol stack. It sits above consensus and below execution, orchestrating the flow of messages from finalized blocks into canister input queues, triggering execution, routing the resulting inter-canister messages, and certifying subnet state. + +Its responsibilities fall into four areas: + +- **Induction.** Extracting messages from finalized consensus blocks and placing them into canister input queues. +- **Execution invocation.** Triggering the execution layer to process the inducted messages. +- **Message routing.** Forwarding inter-canister messages within the subnet and into outgoing XNet streams for cross-subnet delivery. +- **State certification.** Certifying the subnet's replicated state using [chain-key cryptography](../chain-key-cryptography.md). + +Although the layer is named for message routing, state certification is equally important: it underlies chain-evolution technology and allows nodes to catch up to the current state without replaying all historical blocks. + +## Message processing + +Whenever consensus produces a finalized block, it hands the block to message routing. This marks the transition between the lower and upper halves of the protocol stack: the lower two layers agree on a block, and the upper two layers process it deterministically. + +Message routing extracts [ingress messages](../../references/glossary.md#ingress-message) (submitted by users) and [XNet](../../references/glossary.md#xnet) messages (sent by canisters on other subnets) from the block. Each message is placed into the input queue of its target canister. This process is called **induction**, and all queues together are called the [**induction pool**](../../references/glossary.md#induction-pool). After induction, message routing triggers the execution layer, which schedules and executes messages from the pool. + +Message routing and execution modify subnet state in a deterministic way: every honest node makes the same state changes, preserving the replicated state machine properties of the subnet. + +## Inter-canister messaging + +Executing a canister message can produce new inter-canister messages. How those messages are handled depends on whether the target canister is on the same subnet or a different one. + +### Intra-subnet messages + +Messages to canisters on the same subnet do not go through consensus. Because they deterministically result from an already-agreed message, their execution is also deterministic. The execution layer places these messages directly into the target canister's input queue. This process is transitive: a message can produce more messages, forming a tree of execution. Intra-subnet messages are executed as long as the cycles limit for the round has not been exhausted. Remaining messages are deferred to subsequent rounds. + +Local canister-to-canister messaging is asynchronous. Messages are queued and scheduled rather than synchronously invoked, which is the standard inter-canister semantics on ICP. + +### Cross-subnet messages (XNet) + +Messages to canisters on other subnets are placed into the outgoing XNet stream for the target subnet. At the end of the round, message routing certifies these streams using a Merkle-tree-style data representation and chain-key cryptography. This means every outgoing XNet message is authenticated by the originating subnet's collective signature. + +Block makers on the receiving subnet fetch certified XNet messages during block assembly, validate the certificate against the originating subnet's public key, and include valid messages in a consensus block. The Merkle-tree structure allows partial consumption: a receiving subnet can include some XNet messages from a stream in one round and the rest in a later round, while still validating each message's authenticity. + +## State certification + +The replicated state of a subnet includes all information needed for its operation. Message routing certifies this state in two modes. + +### Per-round certification + +At the end of each round (when all messages have been executed or the cycles limit has been reached), message routing certifies a subset of the state tree: + +- Responses to ingress messages (the ingress history) +- XNet messages queued for other subnets +- Canister metadata (module hashes and certified variables) + +These certified responses can be read and validated against the subnet's public key by users. Each subnet's public key is in turn certified by the [Network Nervous System (NNS)](../../references/glossary.md#network-nervous-system-nns), so a certified response can be verified against a single root of trust: the NNS public key. This provides a powerful alternative to reading transaction logs, as responses are authenticated by the network rather than by a centralized server. + +Per-round state certification enables secure, verifiable inter-subnet communication, which is a core enabler of ICP's scalability across many subnets. + +### Per-checkpoint certification + +Not all state is certified every round. Canister Wasm code and written memory pages are certified only at checkpoints, which are periodic snapshots of the entire replicated state persisted to disk. + +Checkpoints are created roughly every 10 minutes. For each checkpoint, the subnet computes a certification over a Merkle-tree manifest. Certification is incremental: only the pages that changed since the last checkpoint need to be processed, and their changes are propagated up the tree. The root hash of the manifest is signed by the subnet, forming a [**catch-up package**](../../references/glossary.md#catch-up-package-cup) that new or recovering nodes can use to join without replaying the full block history. + +The time to compute a checkpoint certification is linear in the number of changed memory pages, not the total state size. This matters as subnets can hold terabytes of state: a full recertification of that volume at each checkpoint interval would be impractical. + +## Further reading + +- [Protocol Stack](index.md): how message routing fits into the four-layer architecture +- [State synchronization](state-synchronization.md): how catch-up packages are used by joining nodes + + diff --git a/docs/concepts/protocol/peer-to-peer.md b/docs/concepts/protocol/peer-to-peer.md new file mode 100644 index 0000000..1dc6085 --- /dev/null +++ b/docs/concepts/protocol/peer-to-peer.md @@ -0,0 +1,31 @@ +--- +title: "Peer-to-Peer Layer" +description: "How ICP nodes broadcast artifacts and exchange protocol messages using the Abortable Broadcast primitive and QUIC transport." +--- + +The peer-to-peer (P2P) layer is the bottommost layer in the ICP protocol stack. It is responsible for secure and reliable communication between the nodes of a subnet, providing the foundation on which all higher protocol layers depend. + +P2P allows nodes to broadcast artifacts: user inputs to canisters and protocol messages such as block proposals. Its key property is guaranteed message delivery to all required subnet nodes despite varying real-world network conditions and node failures. The P2P layer is used by the [consensus layer](consensus.md) to broadcast artifacts to the other nodes in a subnet. + +## Abortable Broadcast + +At the heart of the P2P layer is the Abortable Broadcast primitive, which is critical for efficient communication in a setting where nodes may fail or act maliciously. With Abortable Broadcast, nodes can explicitly abort the transmission of artifacts they no longer need. This allows the protocol to provide strong delivery guarantees in the presence of network congestion, node or link failures, and backpressure. + +By preserving bandwidth and bounding the size of its data structures, Abortable Broadcast prevents overload from malicious nodes while ensuring delivery of non-aborted artifacts from honest nodes. It resembles a publish/subscribe model with the added ability to abort in-flight messages when needed. + +The P2P layer allows filtering of incoming artifacts: accepting only necessary ones while discarding or delaying others. Crucial artifacts are obtained more quickly than non-essential ones. This reduces the processing load of the layers above P2P. + +## QUIC transport + +The Abortable Broadcast implementation relies on a transport component built on top of [QUIC](https://en.wikipedia.org/wiki/QUIC): a custom RPC library that enables efficient orchestration of multiple higher-level protocols on the same replica. Key features include message multiplexing and caller pushback when packet consumption lags behind packet production. + +## Security + +To prevent denial-of-service attacks, nodes connect only with other nodes in the same subnet. Subnet membership is managed by the [Network Nervous System (NNS)](../../references/glossary.md#network-nervous-system-nns). The NNS registry canister acts as a service discovery mechanism for the P2P layer, enabling encrypted and authenticated communication between nodes through TLS. + +## Further reading + +- [Protocol Stack](index.md): how P2P fits into the four-layer architecture +- [Abortable Broadcast paper](https://arxiv.org/abs/2410.22080) + + diff --git a/docs/concepts/protocol/state-synchronization.md b/docs/concepts/protocol/state-synchronization.md new file mode 100644 index 0000000..b6778ab --- /dev/null +++ b/docs/concepts/protocol/state-synchronization.md @@ -0,0 +1,31 @@ +--- +title: "State Synchronization" +description: "How ICP nodes join or re-join a subnet by downloading certified checkpoints instead of replaying the full block history." +--- + +State synchronization allows nodes to join a running subnet or recover from downtime without replaying every message ever executed. Instead, the protocol creates periodic certified checkpoints that capture a complete snapshot of the subnet state. A node that needs to catch up downloads a recent checkpoint and replays only the blocks produced since that checkpoint. + +Checkpoints are certified by the subnet through a signature over a Merkle-tree manifest (see [Message routing: per-checkpoint certification](message-routing.md#per-checkpoint-certification)). They are made available to other nodes via the [peer-to-peer layer](peer-to-peer.md) as part of a [**catch-up package**](../../references/glossary.md#catch-up-package-cup). + +## Joining nodes + +A new node downloads the latest catch-up package, validates it, and then downloads the corresponding state. This involves transferring potentially gigabytes to terabytes of data. The transfer is done efficiently and in parallel from multiple peers: the state is chunked, each chunk is authenticated individually through its hash in the manifest's Merkle tree, and different chunks can be downloaded from different peers simultaneously. This approach is similar to BitTorrent. + +Once the full checkpoint state is downloaded and authenticated, the node replays the blocks produced since that checkpoint to reach the current block height. + +Without state synchronization, joining a busy subnet would be impractical. A node would need to replay every block from the subnet's genesis, potentially amounting to years of CPU computation on a subnet that has been running with high utilization. State synchronization makes this feasible by limiting replay to only recent blocks. + +## Recovering nodes + +A node that was temporarily offline may still hold an older checkpoint. In this case, only the chunks that differ from its local checkpoint need to be downloaded, which can significantly reduce the volume of data transferred. + +The subnet state is organized as a Merkle tree and can reach up to a terabyte in size. A recovering node first requests the children of the root of the state tree from its peers. It then recursively downloads only the subtrees that differ from its local state, skipping the parts it already has. + +This incremental approach ensures that a recovering node transfers the minimum amount of data needed to rejoin the subnet, rather than downloading the full state again. + +## Further reading + +- [Message routing](message-routing.md): how checkpoints and state certification work +- [Peer-to-peer](peer-to-peer.md): the broadcast layer used to transfer checkpoint chunks + + diff --git a/docs/references/glossary.md b/docs/references/glossary.md index e456386..4006d92 100644 --- a/docs/references/glossary.md +++ b/docs/references/glossary.md @@ -48,6 +48,10 @@ referred to as one **e8**. A **batch** is a collection of [messages](#message) whose order is agreed upon by [consensus](#consensus). +#### block maker + +A **block maker** is a [node](#node) selected by the [consensus](#consensus) protocol to propose a block in a given round. Block makers are chosen through a random permutation of [subnet](#subnet) nodes using randomness from the [random beacon](#random-beacon). The lowest-ranked node acts as the primary block maker; higher-ranked nodes step in if the primary fails to produce a notarized block within the timeout. + #### beneficiary The **beneficiary** of an [account](#account) is the [principal](#principal) who owns the [balance](#balance) of the account. The beneficiary of an @@ -171,8 +175,8 @@ means of which a number of [nodes](#node) can reach agreement about a value or state. Consensus is a core component of the [replica](#replica) -software. The [consensus](https://learn.internetcomputer.org/hc/en-us/articles/34207558615956-Consensus) layer selects [messages](#message) -from the [peer-to-peer](https://learn.internetcomputer.org/hc/en-us/articles/34207428453140-Peer-to-peer) artifact pool and pulls messages from the +software. The [consensus](../concepts/protocol/consensus.md) layer selects [messages](#message) +from the [peer-to-peer](../concepts/protocol/peer-to-peer.md) artifact pool and pulls messages from the cross-network streams of other [subnets](#subnet) and organizes them into a [batch](#batch), which is delivered to the [message routing](#message-routing) layer. @@ -210,6 +214,10 @@ A **data center** (DC) is a physical site that hosts software infrastructure required for node deployment. Data centers are nodes that are selected and vetted by the [NNS](#network-nervous-system-nns). +#### Deterministic Time Slicing (DTS) + +**Deterministic Time Slicing** (DTS) is a mechanism in the [execution layer](../concepts/protocol/execution.md) that allows a long-running canister computation to span multiple [consensus](#consensus) rounds. Instead of timing out, a computation that exceeds the per-round instruction limit is paused at the end of a round and automatically resumed in the next. DTS is transparent to canisters and requires no special canister code. + #### dissolve delay The **dissolve delay** is the amount of time that @@ -360,7 +368,7 @@ another or from a user to a canister. #### message routing -The **[message routing](https://learn.internetcomputer.org/hc/en-us/articles/34208241927316-Message-Routing)** layer receives [batches](#batch) from +The **[message routing](../concepts/protocol/message-routing.md)** layer receives [batches](#batch) from the [consensus](#consensus) layer and inducts them into the [induction pool](#induction-pool). Message routing then schedules a set of [canisters](#canister) to execute messages @@ -474,6 +482,10 @@ Node providers are selected and vetted by the [NNS](#network-nervous-system-nns) ## O +#### orthogonal persistence + +**Orthogonal persistence** is the storage model used by the ICP [execution layer](../concepts/protocol/execution.md). Canister memory pages are persisted to disk automatically after each round without requiring explicit read or write operations. Developers can treat canister state as always in memory; the runtime handles persistence transparently. See the [orthogonal persistence concept page](../concepts/orthogonal-persistence.md) for details. + #### output queue Each [canister](#canister) has an **output queue** of @@ -483,18 +495,18 @@ Each [canister](#canister) has an **output queue** of #### peer-to-peer (P2P) -In common usage, **[peer-to-peer](https://learn.internetcomputer.org/hc/en-us/articles/34207428453140-Peer-to-peer)** (P2P) computing or networking is a +In common usage, **[peer-to-peer](../concepts/protocol/peer-to-peer.md)** (P2P) computing or networking is a distributed application architecture that partitions workload across a network of equally privileged computer [nodes](#node) so that participants can contribute resources such as processing power, disk storage, or network bandwidth to handle application workload. -The **[peer-to-peer](https://learn.internetcomputer.org/hc/en-us/articles/34207428453140-Peer-to-peer) layer** collects and disseminates +The **[peer-to-peer](../concepts/protocol/peer-to-peer.md) layer** collects and disseminates [messages](#message) and artifacts from users and from other nodes. The [nodes](#node) of a [subnet](#subnet) form a -dedicated [peer-to-peer](https://learn.internetcomputer.org/hc/en-us/articles/34207428453140-Peer-to-peer) broadcast network that facilitates the secure +dedicated [peer-to-peer](../concepts/protocol/peer-to-peer.md) broadcast network that facilitates the secure **bounded-time/eventual delivery** broadcast of artifacts (such as [ingress messages](#ingress-message), control messages, and their signature shares). The [consensus](#consensus) layer @@ -545,6 +557,10 @@ preserved. Queries are synchronous and can be made to any ## R +#### random beacon + +The **random beacon** is a source of cryptographic randomness produced each [consensus](#consensus) round using threshold BLS signatures. Every [subnet](#subnet) node contributes a signature share; when enough shares are combined, a verifiable random value is produced. The random beacon is used to select [block makers](#block-maker) and other randomized elements of the consensus protocol. + #### replica The **replica** an instance of software containing all the protocol components @@ -604,6 +620,10 @@ regular ledger [account](#account) (i.e., any ledger account except the [ICP supply account](#icp-supply-account)) to another regular ledger account. +#### Trusted Execution Environment (TEE) + +A **Trusted Execution Environment** (TEE) is a hardware-enforced isolation mechanism that protects the memory and state of a virtual machine from the host operating system and hypervisor. ICP uses AMD SEV-SNP as its TEE technology on supported nodes, providing memory encryption, VM launch measurements, and attestation reports that allow external parties to verify the exact software a node is running. See the [node infrastructure concept page](../concepts/node-infrastructure.md#trusted-execution-environments) for details. + ## U #### user @@ -645,6 +665,10 @@ stack-based virtual machine. ## X +#### XNet + +**XNet** is the cross-subnet messaging stream used to deliver [messages](#message) between [canisters](#canister) on different [subnets](#subnet). XNet messages produced by the [execution layer](../concepts/protocol/execution.md) are certified by the originating subnet using [chain-key](#chain-key) cryptography and validated by [block makers](#block-maker) on the receiving subnet before being included in a block. + #### XDR **XDR** is the currency code for *special drawing rights (SDR)*. SDRs are supplementary foreign exchange assets that are defined and maintained by the International Monetary Fund (IMF). SDRs are not a currency themselves but represent a claim to a currency that is held by IMF member countries in which they may be exchanged. The ICP developer docs refer to currencies based on their currency codes, therefore SDRs are referenced as its currency code **XDR** in this documentation.