Bitcoin P2P Network
This lecture focused on the bitcoin peer-to-peer (p2p) network. While this lecture just talked about the current network and its implementation which I will discuss,
Questions answered in this Post:
- What are characteristics of the blockchain network?
- How do you as an individual connect to the blockchain network?
- What is a full node and what is a SPV?
- What is the size of the network?
What is the blockchain network and how can people join this network?
The Bitcoin P2P network is quite similar to other peer-to-peer networks. I only know a few p2p networks outside of Bitorrent which is Gnutella and e2dk. I remember that there were many issues with Gnutella regarding scalability and message propagation. Similar to other peer-to-peer networks, it has the properties that all the nodes are equal and there is no hierarchy. It uses TCP (Transmissiong Control Protocol) with a random topology. Anyone can join the network and leave the network as well. Leaving the network is easy since if the network doesn’t hear from the node for three hours, it is just assumed that it is no longer online and stops sending messages to it.
Again, I’m talking about the Bitcoin P2P network and as I’ll talk about later this notion of equality does not stand true for other side bitcoin networks.
Key Characteristics
- ad-hoc protocol (runs on TCP port 8333)
- ad-hoc network with random topology (random nodes peering)
- all nodes are equal (no central/master)
- new nodes can join at any time (anyone can download and get started)
- forget non-responding nodes after 3 hours
What does it mean that anyone can join at anytime?
Well anyone can download something like Bitcoin Core or use npm to install Bitcore and become a full node. As with any peer-to-peer network, pay attention to security, bandwidth, and actual space concerns.
How does a new node connect to the network?
Simple answer would be just connect to one node and then more will follow. Though, there are a few more steps then just that.
- Connect to a seed node with a message like, “Hello World! I’m ready to Bitcoin”. Seed nodes are hard coded IP addresses that one an use to connect to another active node. Instead of using the IP addresses, some program with use DNS seeds, which let you look up the IP addresses instead of just providing one. A few DNS seed names are bitseed.xf2.org or seed.bitcoin.sipa.be
- To first connect, send a version message and receive a version message back. Then send a verack to confirm the connection.
- Send the messages getaddr and addr to the seedNode
- Next you connect to the nodes that seedNode sends you
- Repeat with the new nodes to be better connected.
What happens in the network?
Transactions that one node hears are shared across the entire network. This is Transaction propagation (flooding) or a gossip protocol. It is a simple gossip Protocol where the network is just sending the message to every node it knows. At certain short time periods, a message gets sent to random targets in a pairwise fashion and each time, the node is responsible to update its view of the blockchain and determine whether to send the transaction outwards. Each node has its own list of pending transactions and must decide to forward or not based on a certain set of criteria. Also, like a breadth first search, it has a check to see whether it has seen a certain transaction before to prevent message from being sent forever. According to bitcoin.stackexchange, it takes about 15 seconds for a message to be propagated.
There are a set of checks to determine whether the transaction should be propagated. Note that these checks are not enforced. They can be ignored if certain nodes have different incentives or are malicious. One check is to just make sure that the transaction is valid within the blockchain. A few of those checks are for syntactic correctness, size in bytes is less than the MAX_BLOCK_SIZE as well as the size of the output must be legal monetary range. Then, it checks whether the transaction has been seen before which it can look up into the pending transactions list. Also, it needs to check that this transaction has not been incorporated in another block or has already been spent. This site has the documented protocol rules.
What are some checks done to see if the node should propagate the message?
- Transaction valid with current block chain
- default script matches a whitelist (avoid unusual scripts)
- won’t relay by default (Why not)
- haven’t seen before (avoid infinite loops)
- doesn’t conflict with others transactions previously relayed (avoid double-spends)
- Documented protocol rules
It is possible that the nodes will end up with different set of pending transactions or a different ordering of the transaction events. This is called a race condition in bitcoin. Because, only one person is defining the next block, that person who is mining will break up the race condition by publishing. This usually creates a clear set of actions on how to deal with the race condition meaning that one chain may get dropped because it would be a double spend after this block has been published. Nodes will usually accept the transaction that they have received first. A similar algorithm is used for block propagation as well where more information is found here. One thought you may have is what happens to these transactions or blocks that don’t get put on the main block chain. They are called orphan transaction and orphan blocks respectfully. An orphan block does not have a parent on the longest block chain. From blockchain info, one can see there are about 2-3 orphans blocks created per week.
Race conditions: Transaction or blocks may conflict
- default behavior: accept what you hear first
- network positions matters then
- miners have freedom to implement their own logic which could exacerbate these race conditions
Now, that we know what the network is doing, what is the size of the network?
While, not clear how to measure it, there are between 1,000 – 10,000 fully validating nodes. A fully validating node is one that it permanently connected, stores the entire block chain, and is actively hearing and forwarding every node/transaction. They also need to track the unspent transaction output (UTXO). These are all transaction that have not been put into the blockchain. However, there are some nodes that connect in and out of the network maybe just to complete a transaction or check some status of a transactions. In July 2014, the size of the block chain was 20 GB. Now in March 2017, it’s almost 100 GB. Also, while in Jul 2014, the UTXO was only 20 MB. In July 2015, it is 650 MB.
The lecturer, Joseph, mentioned that that the number of full nodes are decreasing. It makes sense since as time passes, to store the chain involves more space and RAM. Unless one is miner, or part of some large organization where you are actively getting some benefit for maintaining the full node, it doesn’t seem reasonable to continue doing. I admit there are people who will continue holding the nodes because they believe in bitcoin and for those people, that’s awesome. When people have clients running on their phones, or PCs, likely it is just a lightweight node. People also refer to these nodes as Simple Payment Verification (SPV) client. Bitcoin wallet programs tend to incorporate SPV nodes. A lightweight node just stores a subset of the transactions sent that may be needed to verify certain transactions. These lightweight nodes only work because they are trusting the fully-validating nodes to do their job. There has been much discuss on the internet regarding how many full nodes are enough and who should run a full node.