This document describes how our network works. At this moment, it is known to be somewhat outdated, as we are in the process of refactoring the network protocol somewhat significantly.
1. Overview
Near Protocol uses its own implementation of a custom peer-to-peer network Peers who join network are represented by nodes and connections between them by edges.
The purpose of this document is to describe inner workings of near-network
package; and to be used as reference by future engineers to understand network
code without any prior knowledge.
2. Code structure
near-network runs on top of actor framework called Actix
(https://actix.rs/docs/). Code structure is split between 4 actors
PeerManagerActor, PeerActor, RoutingTableActor, EdgeValidatorActor
2.1 EdgeValidatorActor (currently called EdgeVerifierActor in the code (TODO rename))
EdgeValidatorActor runs on separate thread. The purpose of this actor is to
validate edges, where each edge represents a connection between two peers,
and it's signed with a cryptographic signature of both parties. The process of
edge validation involves verifying cryptographic signatures, which can be quite
expensive, and therefore was moved to another thread.
Responsibilities:
- validating edges by checking whenever cryptographic signatures match.
2.2 RoutingTableActor
RoutingTableActor maintain view of the P2P network represented by set of
nodes and edges.
In case a message needs to be sent between two nodes, that can be done directly
through Tcp connection. Otherwise, RoutingTableActor is responsible for ping
the best path between them.
Responsibilities:
- keep set of all edges of
P2P networkcalled routing table - connects to
EdgeValidatorActor, and asks for edges to be validated, when needed - has logic related to exchanging edges between peers
2.3 PeerActor
Whenever a new connection gets accepted, an instance of PeerActor gets
created. Each PeerActor keeps a physical a TCP connection to exactly one
peer.
Responsibilities:
- Maintaining physical connection.
- Reading messages from peers, decoding them, and then forwarding them to the right place.
- Encoding messages, sending them to peers on physical layer.
- Routing messages between
PeerManagerActorand other peers.
2.4 PeerManagerActor
PeerManagerActor is the main actor of near-network crate. It's acts as a
bridge connecting to the world outside, the other peers, and ClientActor and
ClientViewActor, which handle processing any operations on the chain.
PeerManagerActor maintains information about p2p network via (Routing Table
Actor), and indirectly, through PeerActor, connections to all some nodes on
the network. All messages going to other nodes, or coming from other nodes will
be routed through this Actor. PeerManagerActor is responsible for accepting
incoming connections from the outside world and creating PeerActors to manage
them.
Responsibilities:
- Accepting new connections
- Maintaining list of
PeerActors, creating, deleting them - Routing information about new edges between
PeerActorsandRoutingTableManager - Routing messages between
ViewClient,ViewClientActorandPeerActors, and consequently other peers. - Maintains
RouteBackstructure, which has information on how to send replies to messages
3. Code flow - initialization
PeerManagerActor actor gets started. PeerManagerActor open tcp server, which
listens to incoming connection. It starts RoutingTableActor, which then starts
EdgeValidatorActor. When connection incoming connection gets accepted, it
starts a new PeerActor on its own thread.
4. NetworkConfig
near-network reads configuration from NetworkConfig, which is a part client config.
Here is a list of features read from config
boot_nodes- list of nodes to connect to on startaddr- listening addressmax_num_peers- by default we connect up to 40 peers, current implementation supports upto 128 nodes.
5. Connecting to other peers.
Each peer maintains list of known peers. They are stored in the database. If
database is empty, the list of peers, called boot nodes, will be read from
boot_nodes option in config. Peer to connect to is chosen at random from list
of known nodes by PeerManagerActor::sample_random_peer method.
6. Edges & network - in code representation
P2P network is represented by list of peers, where each peer is
represented by structure PeerId, which is defined by peer's public key
PublicKey. And a list of edges, where each edge is represented by the
structure Edge.
Both are defined below.
6.1 PublicKey
We use two types of public keys:
- a 256 bit
ED25519public key - a 512 bit
Secp256K1public key
Public keys are defined in PublicKey enum, which consists of those two
variants.
#![allow(unused)] fn main() { pub struct ED25519PublicKey(pub [u8; 32]); pub struct Secp256K1PublicKey([u8; 64]); pub enum PublicKey { ED25519(ED25519PublicKey), SECP256K1(Secp256K1PublicKey), } }
6.2 PeerId
Each peer is uniquely defined by its PublicKey, and represented by PeerId
struct.
#![allow(unused)] fn main() { pub struct PeerId(PublicKey); }
6.3 Edge
Each edge is represented by Edge structure. It contains the following
- pair of nodes represented by their public keys.
nonce- a unique number representing state of an edge. Starting with 1. Odd number represents an active edge. Even number represent an edge in which one of nodes, confirmed that the edge is removed.- Signatures from both peers for active edges.
- Signature from one peers in case an edge got removed.
6.4 Graph representation
RoutingTableActor is responsible for storing and maintaining set of all edges.
They are kept in edge_info data structure of type HashSet<Edge>.
#![allow(unused)] fn main() { pub struct RoutingTableActor { /// Collection of edges representing P2P network. /// It's indexed by `Edge::key()` key and can be search through by called `get()` function /// with `(PeerId, PeerId)` as argument. pub edges_info: HashSet<Edge>, /// ... } }
7. Code flow - connecting to a peer - handshake
When PeerManagerActor starts it starts to listen to a specific port.
7.1 - Step 1 - monitor_peers_trigger runs
PeerManager checks if we need to connect to another peer by running
PeerManager::is_outbound_bootstrap_needed method. If true we will try to
connect to new node. Let's call current node, node A.
7.2 - Step 2 - choosing node to connect to
Method PeerManager::sample_random_peer will be called, and it returns node B
that we will try to connect to.
7.3 - Step 3 - OutboundTcpConnect message
PeerManagerActor will send to itself a message OutboundTcpConnect in order
to connect to node B.
#![allow(unused)] fn main() { pub struct OutboundTcpConnect { /// Peer information of the outbound connection pub target_peer_info: PeerInfo, } }
7.4 - Step 4 - OutboundTcpConnect message
On receiving the message handle_msg_outbound_tcp_connect method will be
called, which calls TcpStream::connect to create new connection.
7.5 - Step 5 - Connection gets established
Once connection with outgoing peer gets established. try_connect_peer method
will be called. And then new PeerActor will be created and started. Once
PeerActor starts it will send Handshake message to outgoing node B over
tcp connection.
This message contains protocol_version, node's A metadata, as well as all
information necessary to create Edge.
#![allow(unused)] fn main() { pub struct Handshake { /// Current protocol version. pub(crate) protocol_version: u32, /// Oldest supported protocol version. pub(crate) oldest_supported_version: u32, /// Sender's peer id. pub(crate) sender_peer_id: PeerId, /// Receiver's peer id. pub(crate) target_peer_id: PeerId, /// Sender's listening addr. pub(crate) sender_listen_port: Option<u16>, /// Peer's chain information. pub(crate) sender_chain_info: PeerChainInfoV2, /// Represents new `edge`. Contains only `none` and `Signature` from the sender. pub(crate) partial_edge_info: PartialEdgeInfo, } }
7.6 - Step 6 - Handshake arrives at node B
Node B receives Handshake message. Then it performs various validation
checks. That includes:
- Check signature of edge from the other peer.
- Whenever,
nonceis the edge send matches. - Check whenever the protocol is above the minimum
OLDEST_BACKWARD_COMPATIBLE_PROTOCOL_VERSION - Other node
view of chainstate
If everything is successful, PeerActor will send RegisterPeer message to
PeerManagerActor. This message contains everything needed to add PeerActor
to list of active connections in PeerManagerActor.
Otherwise, PeerActor will be stopped immediately or after some timeout.
#![allow(unused)] fn main() { pub struct RegisterPeer { pub(crate) actor: Addr<PeerActor>, pub(crate) peer_info: PeerInfo, pub(crate) peer_type: PeerType, pub(crate) chain_info: PeerChainInfoV2, // Edge information from this node. // If this is None it implies we are outbound connection, so we need to create our // EdgeInfo part and send it to the other peer. pub(crate) this_edge_info: Option<EdgeInfo>, // Edge information from other node. pub(crate) other_edge_info: EdgeInfo, // Protocol version of new peer. May be higher than ours. pub(crate) peer_protocol_version: ProtocolVersion, } }
7.7 - Step 7 - PeerManagerActor receives RegisterPeer message - node B
In handle_msg_consolidate method RegisterPeer message will be validated. If
successful register_peer method will be called, which adds PeerActor to list
of connected peers.
Each connected peer is represented in PeerActorManager in ActivePeer data
structure.
TODO: Rename ActivePeer to RegisterPeer.
#![allow(unused)] fn main() { /// Contains information relevant to an active peer. struct ActivePeer { // will be renamed to `ConnectedPeer` see #5428 addr: Addr<PeerActor>, full_peer_info: FullPeerInfo, /// Number of bytes we've received from the peer. received_bytes_per_sec: u64, /// Number of bytes we've sent to the peer. sent_bytes_per_sec: u64, /// Last time requested peers. last_time_peer_requested: Instant, /// Last time we received a message from this peer. last_time_received_message: Instant, /// Time where the connection was established. connection_established_time: Instant, /// Who started connection. Inbound (other) or Outbound (us). peer_type: PeerType, } }
7.8 - Step 8 - Exchange routing table part 1 - node B
At the end of register_peer method node B will performance
RoutingTableSync sync. Sending list of known edges representing full graph,
and list of known AnnounceAccount. Those will be covered later, in their
dedicated sections see sections TODO1, TODO2.
message: PeerMessage::RoutingTableSync(SyncData::edge(new_edge)),
#![allow(unused)] fn main() { /// Contains metadata used for routing messages to particular `PeerId` or `AccountId`. pub struct RoutingTableSync { // also known as `SyncData` (#5489) /// List of known edges from `RoutingTableActor::edges_info`. pub(crate) edges: Vec<Edge>, /// List of known `account_id` to `PeerId` mappings. /// Useful for `send_message_to_account` method, to route message to particular account. pub(crate) accounts: Vec<AnnounceAccount>, } }
7.9 - Step 9 - Exchange routing table part 2 - node A
Upon receiving RoutingTableSync message. Node A will reply with own
RoutingTableSync message.
7.10 - Step 10 - Exchange routing table part 2 - node B
Node B will get the message from A and update it's routing table.
8. Adding new edges to routing tables
This section covers the process of adding new edges, received from another nodes, to the routing table. It consists of several steps covered below.
8.1 Step 1
PeerManagerActor receives RoutingTableSync message containing list of new
edges to add. RoutingTableSync contains list of edges of the P2P network.
This message is then forwarded to RoutingTableActor.
8.2 Step 2
PeerManagerActor forwards those edges to RoutingTableActor inside of
ValidateEdgeList struct.
ValidateEdgeList contains:
- list of edges to verify
- peer who send us the edges
8.3 Step 3
RoutingTableActor gets the ValidateEdgeList message. Filters out edges
that have already been verified, those that are already in
RoutingTableActor::edges_info.
Then, it updates edge_verifier_requests_in_progress to mark that edge
verifications are in progress, and edges shouldn't be pruned from Routing Table
(see section TODO).
Then, after removing already validated edges, the modified message is forwarded
to EdgeValidatorActor.
8.4 Step 4
EdgeValidatorActor goes through list of all edges. It checks whether all edges
are valid (their cryptographic signatures match, etc.).
If any edge is not valid peer will be banned.
Edges that are validated are written to a concurrent queue
ValidateEdgeList::sender. This queue is used to transfer edges from
EdgeValidatorActor, back to PeerManagerActor.
8.5 Step 5
broadcast_validated_edges_trigger runs, and gets validated edges from
EdgeVerifierActor.
Every new edge will be broadcast to all connected peers.
And then, all validated edges received from EdgeVerifierActor will be sent
again to RoutingTableActor inside AddVerifiedEdges.
8.5 Step 6
When RoutingTableActor receives RoutingTableMessages::AddVerifiedEdges, the
methodadd_verified_edges_to_routing_table will be called. It will add edges to
RoutingTableActor::edges_info struct, and mark routing table, that it needs
recalculation see RoutingTableActor::needs_routing_table_recalculation.
9 Routing table computation
Routing table computation does a few things:
- For each peer
B, calculates set of peers|C_b|, such that each peer is on the shortest path toB. - Removing unreachable edges from memory and storing them to disk.
- The distance is calculated as the minimum number of nodes on the path from
given node
A, to each other node on the network. That is,Ahas a distance of0to itself. It's neighbors will have a distance of1. The neighbors of theirs neighbors will have a distance of2, etc.
9.1 Step 1
PeerManagerActor runs a update_routing_table_trigger every
UPDATE_ROUTING_TABLE_INTERVAL seconds.
RoutingTableMessages::RoutingTableUpdate message is sent to
RoutingTableActor to request routing table re-computation.
9.2 Step 2
RoutingTableActor receives the message, and then
- calls
recalculate_routing_tablemethod, which computesRoutingTableActor::peer_forwarding: HashMap<PeerId, Vec<PeerId>>. For eachPeerIdon the network, gives list of connected peers, which are on the shortest path to the destination. It marks reachable peers inpeer_last_time_reachablestruct. - calls
prune_edgeswhich removes from memory all edges, that were not reachable for at least 1 hour, based onpeer_last_time_reachabledata structure. Those edges are then stored to disk.
9.3 Step 3
RoutingTableActor sends RoutingTableUpdateResponse message back to
PeerManagerActor.
PeerManagerActor keep local copy of edges_info, called local_edges_info
containing only edges adjacent to current node.
RoutingTableUpdateResponsecontains list of local edges, whichPeerManagerActorshould remove.peer_forwardingwhich represent on how to route messages in the P2P networkpeers_to_ban- list of peers to ban for sending us edges, which failed validation inEdgeVerifierActor.
9.4 Step 4
PeerManagerActor received RoutingTableUpdateResponse and then:
- updates local copy of
peer_forwarding, used for routing messages. - removes
local_edges_to_removefromlocal_edges_info. - bans peers, who sent us invalid edges.
10. Message transportation layers.
This section describes different protocols of sending messages currently used in
Near
10.1 Messages between Actors.
Near is build on Actix's actor framework.
(https://actix.rs/book/actix/sec-2-actor.html) Usually each actor runs on its
own dedicated thread. Some, like PeerActor have one thread per each instance.
Only messages implementing actix::Message, can be sent using between threads.
Each actor has its own queue; Processing of messages happens asynchronously.
We should not leak implementation details into the spec.
Actix messages can be found by looking for impl actix::Message.
10.2 Messages sent through TCP
Near is using borsh serialization to exchange messages between nodes (See
https://borsh.io/). We should be careful when making changes to them. We have to
maintain backward compatibility. Only messages implementing BorshSerialize,
BorshDeserialize can be sent. We also use borsh for database storage.
10.3 Messages sent/received through chain/jsonrpc
Near runs a json REST server. (See actix_web::HttpServer). All messages sent
and received must implement serde::Serialize and serde::Deserialize.
11. Code flow - routing a message
This is the example of the message that is being sent between nodes
(RawRoutedMessage)
(https://github.com/near/nearcore/blob/fa8749dc60fe0de8e94c3046571731c622326e9f/chain/network-primitives/src/types.rs#L362)
Each of these methods have a target - that is either the account_id or peer_id
or hash (which seems to be used only for route back...). If target is the
account - it will be converted using routing_table.account_owner to the peer.
Upon receiving the message, the PeerManagerActor will sign it
(https://github.com/near/nearcore/blob/master/chain/network/src/peer_manager.rs#L1285)
And convert into RoutedMessage (which also have things like TTL etc.).
Then it will use the routing_table, to find the route to the target peer (add
route_back if needed) and then send the message over the network as
PeerMessage::Routed. Details about routing table computations are covered in
section 8.
When Peer receives this message (as PeerMessage::Routed), it will pass it to
PeerManager (as RoutedMessageFrom), which would then check if the message is
for the current PeerActor. (if yes, it would pass it for the client) and if
not - it would pass it along the network.
All these messages are handled by receive_client_message in Peer.
(NetworkClientMessags) - and transferred to ClientActor in
(chain/client/src/client_actor.rs)
NetworkRequests to PeerManager actor trigger the RawRoutedMessage for
messages that are meant to be sent to another peer.
lib.rs (ShardsManager) has a network_adapter - coming from client’s
network_adapter that comes from ClientActor that comes from start_client call
that comes from start_with_config (that crates PeerManagerActor - that is
passed as target to network_recipent).
12. Database
12.1 Storage of deleted edges
Everytime a group of peers becomes unreachable at the same time; We store edges belonging to them in components. We remove all of those edges from memory, and save them to database, If any of them were to be reachable again, we would re-add them. This is useful in case there is a network split, to recover edges if needed.
Each component is assigned a unique nonce, where first one is assigned nonce
0. Each new component, a get assigned a consecutive integer.
To store components, we have the following columns in the DB.
DBCol::LastComponentNonceStorescomponent_nonce: u64, which is the last used nonce.DBCol::ComponentEdgesMapping fromcomponent_nonceto list of edges.DBCol::PeerComponentMapping frompeer_idto last componentnonceit belongs to.
12.2 Storage of account_id to peer_id mapping
ColAccountAnouncements -> Stores a mapping from account_id to tuple
(account_id, peer_id, epoch_id, signature).