A Blockchain Dilemma: Chain Forks, Catastrophic Re-orgs, and Insurance

Forks often present pivotal moments in the history of a blockchain network. Ethereum’s hard fork after the notorious DAO hack in 2016 created Ethereum Classic, and Bitcoin’s numerous contentious forks have led to upwards of six different iterations of the Bitcoin network and subsequent currency. As permissionless blockchain networks become more ubiquitous and decentralized — with far flung actors and communities with differing intentions all over the world often sharing the same network — the underlying assumptions that maintain a chain’s integrity are likely to be tested.

This presents a number of questions about forking and decentralized network, perhaps most saliently: When is forking a chain the economically sensible thing to do? Let’s say a single project accounts for 80% of the volume on a particular chain in that project’s own token. At what point should the project fork the state and try to capture the premium of the native chain as well? If the project in question has its own wallet and user base that has little overlap with the native chain, are the native chain fees paying for something meaningful, or just lining the pockets of the native coin holders?

What a native chain “sells” is decentralization. Decentralization is a mix of a number of important factors: distribution of native coin ownership, geographic distribution, the security of the individual validators, the stickiness of stakes, how much participants are paying attention to governance, and a host of other more nebulous elements. However, although these decentralization factors are directly related to the likelihood of a catastrophic loss of consistency, ie a re-org.

In the centralized world, institutions like banks in the United States pay insurance to the Federal Exchange, as well as spend tons of money on regulatory compliance designed to ensure the consistency of their internal ledgers. Sufficient decentralization allows the chain to reduce costs associated with making sure that the accounting system is consistent and cannot be corrupted. But even for a public chain as decentralized as Bitcoin, this may not be always possible.

An example: When Binance was hacked for 40m usd earlier this year, the centralized exchange could have published the private keys of the stolen Bitcoin and let the global, decentralized community of miners decide if they wanted to do a 4 day re-org of Bitcoin.

Binance CEO CZ mused on this very subject on Twitter:

“cons: 1 we may damage credibility of BTC, 2 we may cause a split in both the Bitcoin network and community. Both of these damages seems to out-weight $40m revenge. 3 the hackers did demonstrate certain weak points in our design and user confusion, that was not obvious before”

Ultimately Binance decided not to go ahead with this tactic, because they felt it would split the community and undermine confidence in Bitcoin as a decentralized entity. However, this doesn’t mean that in the future we won’t see some hacked third parties attempt this approach. Custodians may even be legally obligated to do it by their insurance company.

A project whose success is contingent on Bitcoin re-orgs never occurring after 1 hour or 6 confirmations requires insurance against a re-org that is larger than 1 hour. This applies to Proof of Stake networks as well. There is no protocol that could prevent a 33%+ slashing event or a hard fork due to a malicious partition. Imagine that some well meaning custodian and staking pool offers negative fees at the right moment, goes viral, and captures more than 33% of the stakeholders. Getting 33% of the stake doesn’t require an economic attack to purchase 33% of the coin, it simply requires an understanding and leveraging of human greed.

Imagine that these malicious actors — now in control of the network — store the validator keys in plain text or fires the wrong engineer at the wrong time. If you think that this scenario is unlikely, just imagine a world where custodians store keys as securely as our financial institutions store our data or credit card numbers.

If we have a mature cryptocurrency ecosystem, projects will likely buy insurance against re-orgs. A project can then make a financial decision between upfront and continuous cost of the fork, along with price of re-org insurance (risk) on its own chain, vs native fees and current re-org insurance costs. To me, this looks like the classic split between vertically integrating your business so you own the entire stack versus specialization and focusing on your own horizontal.

In theory, a chain with 80% of usage from a single project and 20% from long tail projects should result in higher decentralization then the resultant forked chain. In a healthy business environment, projects should start coalescing around a few chains such that the overall cost of re-org insurance for everyone involved drops. Yes, re-org insurance would be lower given more decentralization, but this process should also hit diminishing returns, while chains will compete on other factors.

Chains with slashing could provide their own insurance market for re-orgs priced in the native coin itself. A project would likely choose this over any off-chain insurance as it does not require an oracle. A contract on the native chain could pay out the insurance bonds automatically if a proof of a re-org is presented to the contract. Slashing also guarantees that the network burns the number of tokens used to cause the re-org, which should cause the value of the remaining tokens to increase.

Using a native coin is attractive, because the remaining 66% of the stakes can be used as the insurance collateral and earn an additional return based on volume, while also guaranteeing that the price per token will increase to compensate for the re-org, while a non-native asset may drop in value.

As we’ve already witnessed through a number of contentious forks on notable networks, the risk of malicious actors attempting to take control of networks is already a very real threat. The issue of catastrophic re-orgs and insurance against them will only grow in importance as blockchain usage increases. Chain native insurance via slashing, as described above, presents an opportunity to create positive results for networks looking to protect against the phenomenon, and is an area that the blockchain industry must devote greater attention.


Inside Solana’s Internal Scalability Test

The Solana team recently undertook an internal scalability test to confirm the limits of the Solana blockchain network’s performance capabilities. The test was executed across 200 distributed CPU nodes over 23 regions throughout world, with the nodes online ranging across five continents

While GPU-based testnets of the Solana Network have exhibited peak speeds upwards of 100,000 transactions per second, this internal scalability test utilized retail-level CPU units only to establish the lower baseline of throughput while accommodating for maximum accessibility in the validator population.

The internal scalability test analyzed a number of vectors required to achieve scalability, including transactions per second (both mean and maximum), total transactions and votes, and confirmation times (mean, median, minimum, and maximum).

The results of the test help present a more complete picture of the Solana network’s minimum capacities as we head further towards Mainnet launch. As the Solana infrastructure pipelines transactions through CPU and GPU phases and this particular test solely utilized CPU, the statistics below are considered a lower baseline representation of Solana’s throughput capacity. We expect a fully operative and functional iteration of the Solana Mainnet to exhibit far more robust results.

Below are the results of the Solana internal scalability test:

Mean transactions per second: 29.171

Maximum transactions per second: 44,838

Maximum transactions per second peak: 47,370

Total transactions: 3,266,989

Total votes: 36,720

Mean confirmation time: 2.34 seconds

Median confirmation time: 2.34 seconds

Minimum confirmation time: 1.26 seconds

Maximum confirmation time: 3.37 seconds

Confirmation average: 2.575 seconds

The test indicates that, with a maximum TPS of 47,370, the Solana Network is operating very close to the forecasted Mainnet launch transaction capacity of 50,000 transactions per second even without GPU-based processing. Block confirmation times maintained an average of 2.34 seconds, while dropping to just 1.26 seconds at minimum. The test completed approximately 3.26 million transactions over the course of the test’s duration, with a final mean transactions per second of 29.171.

These figures indicate a highly successful test run for the Solana network as Mainnet launch comes quickly into view. However, perhaps even more exciting to the team and the Solana community-at-large is witnessing many of the platform’s key innovations working seamlessly together in achieving the technical specifications required for success.

For example: The test dashboard clearly indicates node leadership (in green) changing frequently and asynchronously, with Validators (in black) following. This unique mechanism is made possible by Proof of History (POH), Solana’s permissionless source of time that makes the network possible, and unlocks essential functionality in regards to block time (800ms), block propagation (log200(n)), throughput (50-80K TPS), and ledger storage through interactions with core Solana innovations.

Proof of History makes the Tower BFT consensus algorithm possible, which in turn facilitates block optimization via Turbine, and mempool management through Gulf Stream. Transactional throughput is then further streamlined via the Pipelined Transaction Processing Unit, while data storage is maximized with Cloudbreak.

The result of all of Solana’s interconnected innovations is what the scalability test indicates — the world’s first web scale blockchain. For a deeper understanding of how all of these innovations work together, head to the Solana blog to read ‘8 Innovations that Make Solana the World’s First Web Scale Blockchain

Solana’s software philosophy has always been to get out of the way and let the hardware operate at capacity, and the internal scalability test shows this philosophy in action. As such, Solana scales naturally with bandwidth, SSDs, and GPU cores. It is the only blockchain that does, and is how Solana achieves 50,000 TPS on a network of 200 physically distinct nodes around the world.

The next phase of testing in the leadup to Mainnet launch is Tour de SOL, a public beta that incentivizes validators to run nodes — analogous to Cosmos’ Game of Stakes — in challenging the public at large to test the limits of the Solana network while earning tokens for doing so.

There are 8 key innovations that make the Solana network possible:


Pipelining in Solana: The Transaction Processing Unit

To get to sub-second confirmation times and the transactional capacity required for Solana to become the world’s first web-scale blockchain, it’s not enough to just form consensus quickly. The team had to develop a way to quickly validate massive blocks of transactions, while quickly replicating them across the network. To achieve this, the process of transaction validation on the Solana network makes extensive use of an optimization common in CPU design called pipelining.

Pipelining is an appropriate process when there’s a stream of input data that needs to be processed by a sequence of steps and there’s different hardware responsible for each. The quintessential metaphor to explain this is a washer and dryer that wash/dry/fold several loads of laundry in sequence. Washing must occur before drying and drying before folding, but each of the three operations is performed by a separate unit.

To maximize efficiency, one creates a pipeline of stages. We’ll call the washer one stage, the dryer another, and the folding process a third. To run the pipeline, one adds a second load of laundry to the washer just after the first load is added to the dryer. Likewise, the third load is added to the washer after the second is in the dryer and the first is being folded. In this way, one can make progress on three loads of laundry simultaneously. Given infinite loads, the pipeline will consistently complete a load at the rate of the slowest stage in the pipeline.

“We needed to find a way to keep all hardware busy all the time. That’s the network cards, the CPU cores and all the GPU cores. To do it, we borrowed a page from CPU design”, explains Solana Founder and CTO Greg Fitzgerald. “We created a four stage transaction processor in software. We call it the TPU, our Transaction Processing Unit.”

On the Solana network, the pipeline mechanism — Transaction Processing Unit — progresses through Data Fetching at the kernel level, Signature Verification at the GPU level, Banking at the CPU level, and Writing at the kernel space. By the time the TPU starts to send blocks out to the validators, it’s already fetched in the next set of packets, verified their signatures, and begun crediting tokens.

The Validator node simultaneously runs two pipelined processes, one used in leader mode called the TPU and one used in validator mode called the TVU. In both cases, the hardware being pipelined is the same, the network input, the GPU cards, the CPU cores, writes to disk, and the network output. What it does with that hardware is different. The TPU exists to create ledger entries whereas the TVU exists to validate them.

“We knew that signature verification was going to be a bottleneck, but also that it’s this context-free operation that we could offload to the GPU,” says Fitzgersald. “Even after offloading this most expensive operation, there’s still a number of additional bottlenecks, such as interacting with the network drivers and managing the data dependencies within smart contracts that limit concurrency.”

Between the GPU parallelization in this four-stage pipeline, at any given moment, The Solana TPU can be making progress on 50,000 transactions simultaneously. “This can all be achieved with an off-the-shelf computer for under $5000,” explains Fitzgerland. “Not some supercomputer.”

With the GPU offloading onto Solana’s Transaction Processing Unit, the network can affect single node efficiency. Achieving this has been the goal of Solana since inception.

“The next challenge is to somehow get the blocks somehow from the leader node out to all the validator nodes, and to do it in a way that doesn’t congest the network and bring throughput to a crawl,” continues Fitzgerald. “For that, we’ve come up with a block propagation strategy that we call Turbine.

“With Turbine, we structure the Validator nodes into multiple levels, where each level is at least twice the size of the one above it. By having this structure, these distinct levels, confirmation time ends up being proportional to the height of the tree and not the number of nodes in it, which is far greater. Every time the network doubles in size, you’ll see a small bump in confirmation time, but that’s it.”

In addition to technological implementations like Pipelining, there are several key innovations that make Solana’s web-scale blockchain functionality possible. For a deeper understanding of them all, you can read about them on the Solana blog: 

8 key innovations that make the Solana network possible:


Solana Now Supports Libra’s Move VM

Solana is a high performance blockchain with 400ms blocks and a flexible runtime that allows computation to scale with validator hardware. On current iterations of the Solana Testnet, a network of 200 physically distinct nodes supports a sustained throughput of more than 50,000 transactions per second when running with GPUs. We believe it to be the most performant blockchain in the world. 

Facebook’s Libra project is notable for a number of reasons that have been already discussed by the blockchain community in some depth: It’s a blockbuster project built by some of the brightest minds in technology, backed by some of the largest businesses in the world, and will likely be a major boon for adoption of blockchain technology. But for the Solana team in particular, we noted that Libra’s bespoke Move smart contract language separates shared data from the smart contract code that would modify it. 

We found this factor particularly interesting because the Solana team made the same design decision in our runtime—Pipeline. We recognized immediately that Move is a smart contracts language that could not only scale, but share compatibility with Solana. This suggested to us that Move code could be utilized on Solana, and take further advantage of the highly optimized environment of the Solana network. 

Just two weeks later, Solana co-founder Stephen Akridge posted that he was able to execute Libra’s peer-to-peer payment transactions on Solana:

A few days later, we got to work on integration and benchmarking, and through a combination of great language design from the Libra team and meticulous optimization from the Solana team, we are proud to announce support for the Move VM on Solana. What this means is that projects and applications built with Move are compatible with Solana, while being able to utilize the exceptional transactional speed and capacity of the Solana network. 

Move VM Bottlenecks

Pipeline, the Solana transaction processing runtime, allows for parallel execution of transactions across horizontally scaled compute and storage. We suspected that the main difference between executing Move and native Solana transactions will be the additional overhead of interpreting Move bytecode in the Move VM running on Solana. 

Below is the benchmarking for the Move VM on Solana:



As you can see, the total execute time is 684us, and 605us of that is spent in the `vm_execute` function, which handles the bytecode interpreter. This confirmed our expectations. We consider the results of this benchmarking to be truly remarkable, because Pipeline can scale parallel execution of transactions across as many CPU cores as are available to the validator. 

Solana native programs take 1us `execute::vm_execute,` since the instructions are native x86.  MoveVM is nearly 700x slower than our native solana programs, and at this speed we are able to demonstrate how well Pipeline handles CPUs horizontal scaling.  

Increasing the CPU count scales throughput linearly. We have no doubt that the initial VM performance can be dramatically improved with a JIT, or an LLVM front end to allow for direct compilation to a native instruction set.

Move Raw TPS

These benchmarks are run on our testnet with 400ms block times:

Setup 4 nodes GCP n1-standard-32 with 2x P100

CPU bank threadsMean TPSMax TPS

Libra’s Move VM will be utilized by a huge amount of developers all over the world as the project attempts to funnel much of Facebook’s huge market towards cryptocurrency and blockchain applications. That Move is so readily compatible with Solana, and capable of achieving outstanding degrees of further optimization when utilizing the core innovations of the Solana Network, is a highly encouraging sign. It suggests that projects building with Move can be relatively simply ported to Solana, and reap the benefits of speed and transactional throughput.

This proves that although the Move VM runtime is not optimized today, it can be integrated relatively easily into Solana’s chain, where parallel performance can be extracted. We look forward to giving developers a choice of Move environments so they are not locked into a single vendor. We believe that the result will be a more robust and functional blockchain ecosystem. 

If you’re a web developer or working on blockchain and smart contracts, the Move programming language has a lot to offer. You can check out Solana’s Move SDK example here, and run benchmarking for the Move VM on Solana here. If you’d like to learn more about Solana and the core innovations that make it the world’s first web-scale blockchain, this blog post is a great place to start.