Categories
Blog

Pipelining in Solana: The Transaction Processing Unit

To get to sub-second confirmation times and the transactional capacity required for Solana to become the world’s first web-scale blockchain, it’s not enough to just form consensus quickly. The team had to develop a way to quickly validate massive blocks of transactions, while quickly replicating them across the network. To achieve this, the process of transaction validation on the Solana network makes extensive use of an optimization common in CPU design called pipelining.

Pipelining is an appropriate process when there’s a stream of input data that needs to be processed by a sequence of steps and there’s different hardware responsible for each. The quintessential metaphor to explain this is a washer and dryer that wash/dry/fold several loads of laundry in sequence. Washing must occur before drying and drying before folding, but each of the three operations is performed by a separate unit.

To maximize efficiency, one creates a pipeline of stages. We’ll call the washer one stage, the dryer another, and the folding process a third. To run the pipeline, one adds a second load of laundry to the washer just after the first load is added to the dryer. Likewise, the third load is added to the washer after the second is in the dryer and the first is being folded. In this way, one can make progress on three loads of laundry simultaneously. Given infinite loads, the pipeline will consistently complete a load at the rate of the slowest stage in the pipeline.

“We needed to find a way to keep all hardware busy all the time. That’s the network cards, the CPU cores and all the GPU cores. To do it, we borrowed a page from CPU design”, explains Solana Founder and CTO Greg Fitzgerald. “We created a four stage transaction processor in software. We call it the TPU, our Transaction Processing Unit.”

On the Solana network, the pipeline mechanism — Transaction Processing Unit — progresses through Data Fetching at the kernel level, Signature Verification at the GPU level, Banking at the CPU level, and Writing at the kernel space. By the time the TPU starts to send blocks out to the validators, it’s already fetched in the next set of packets, verified their signatures, and begun crediting tokens.

The Validator node simultaneously runs two pipelined processes, one used in leader mode called the TPU and one used in validator mode called the TVU. In both cases, the hardware being pipelined is the same, the network input, the GPU cards, the CPU cores, writes to disk, and the network output. What it does with that hardware is different. The TPU exists to create ledger entries whereas the TVU exists to validate them.

“We knew that signature verification was going to be a bottleneck, but also that it’s this context-free operation that we could offload to the GPU,” says Fitzgersald. “Even after offloading this most expensive operation, there’s still a number of additional bottlenecks, such as interacting with the network drivers and managing the data dependencies within smart contracts that limit concurrency.”

Between the GPU parallelization in this four-stage pipeline, at any given moment, The Solana TPU can be making progress on 50,000 transactions simultaneously. “This can all be achieved with an off-the-shelf computer for under $5000,” explains Fitzgerland. “Not some supercomputer.”

With the GPU offloading onto Solana’s Transaction Processing Unit, the network can affect single node efficiency. Achieving this has been the goal of Solana since inception.

“The next challenge is to somehow get the blocks somehow from the leader node out to all the validator nodes, and to do it in a way that doesn’t congest the network and bring throughput to a crawl,” continues Fitzgerald. “For that, we’ve come up with a block propagation strategy that we call Turbine.

“With Turbine, we structure the Validator nodes into multiple levels, where each level is at least twice the size of the one above it. By having this structure, these distinct levels, confirmation time ends up being proportional to the height of the tree and not the number of nodes in it, which is far greater. Every time the network doubles in size, you’ll see a small bump in confirmation time, but that’s it.”

In addition to technological implementations like Pipelining, there are several key innovations that make Solana’s web-scale blockchain functionality possible. For a deeper understanding of them all, you can read about them on the Solana blog: 

8 key innovations that make the Solana network possible:

Categories
Blog

Solana Now Supports Libra’s Move VM

Solana is a high performance blockchain with 400ms blocks and a flexible runtime that allows computation to scale with validator hardware. On current iterations of the Solana Testnet, a network of 200 physically distinct nodes supports a sustained throughput of more than 50,000 transactions per second when running with GPUs. We believe it to be the most performant blockchain in the world. 

Facebook’s Libra project is notable for a number of reasons that have been already discussed by the blockchain community in some depth: It’s a blockbuster project built by some of the brightest minds in technology, backed by some of the largest businesses in the world, and will likely be a major boon for adoption of blockchain technology. But for the Solana team in particular, we noted that Libra’s bespoke Move smart contract language separates shared data from the smart contract code that would modify it. 

We found this factor particularly interesting because the Solana team made the same design decision in our runtime—Pipeline. We recognized immediately that Move is a smart contracts language that could not only scale, but share compatibility with Solana. This suggested to us that Move code could be utilized on Solana, and take further advantage of the highly optimized environment of the Solana network. 

Just two weeks later, Solana co-founder Stephen Akridge posted that he was able to execute Libra’s peer-to-peer payment transactions on Solana:

A few days later, we got to work on integration and benchmarking, and through a combination of great language design from the Libra team and meticulous optimization from the Solana team, we are proud to announce support for the Move VM on Solana. What this means is that projects and applications built with Move are compatible with Solana, while being able to utilize the exceptional transactional speed and capacity of the Solana network. 

Move VM Bottlenecks

Pipeline, the Solana transaction processing runtime, allows for parallel execution of transactions across horizontally scaled compute and storage. We suspected that the main difference between executing Move and native Solana transactions will be the additional overhead of interpreting Move bytecode in the Move VM running on Solana. 

Below is the benchmarking for the Move VM on Solana:

rangetimecallsavgminmax
deserialize1.07479s33,74531.85us25.006us6.1966ms
verify1.25666s33,74532.113us27.411us3.9678ms
execute23.1015s33,745684.59us553.26us16.527ms
keyed_accounts_time286.14ms33,7458.4790us6.3010us487.90us
data_store4.12289s33,745122.18us82.128us12.985ms






execute::allocator182.69ms36,6894.9790us3.900us1.0624ms
execute::tx_meta1.53850s3668941.933us32.092us15.700ms
execute::module28.271ms34,9014.0490us3.14us116.64us
execute::vm_execute4.08748s34,901605.33us487.04us24.994ms
execute::make_write_set164.62ms36,6894.572us3.512us2.1615ms

As you can see, the total execute time is 684us, and 605us of that is spent in the `vm_execute` function, which handles the bytecode interpreter. This confirmed our expectations. We consider the results of this benchmarking to be truly remarkable, because Pipeline can scale parallel execution of transactions across as many CPU cores as are available to the validator. 

Solana native programs take 1us `execute::vm_execute,` since the instructions are native x86.  MoveVM is nearly 700x slower than our native solana programs, and at this speed we are able to demonstrate how well Pipeline handles CPUs horizontal scaling.  

Increasing the CPU count scales throughput linearly. We have no doubt that the initial VM performance can be dramatically improved with a JIT, or an LLVM front end to allow for direct compilation to a native instruction set.

Move Raw TPS

These benchmarks are run on our testnet with 400ms block times:

Setup 4 nodes GCP n1-standard-32 with 2x P100

CPU bank threadsMean TPSMax TPS
41,4543,372
323,24411,961

Libra’s Move VM will be utilized by a huge amount of developers all over the world as the project attempts to funnel much of Facebook’s huge market towards cryptocurrency and blockchain applications. That Move is so readily compatible with Solana, and capable of achieving outstanding degrees of further optimization when utilizing the core innovations of the Solana Network, is a highly encouraging sign. It suggests that projects building with Move can be relatively simply ported to Solana, and reap the benefits of speed and transactional throughput.

This proves that although the Move VM runtime is not optimized today, it can be integrated relatively easily into Solana’s chain, where parallel performance can be extracted. We look forward to giving developers a choice of Move environments so they are not locked into a single vendor. We believe that the result will be a more robust and functional blockchain ecosystem. 

If you’re a web developer or working on blockchain and smart contracts, the Move programming language has a lot to offer. You can check out Solana’s Move SDK example here, and run benchmarking for the Move VM on Solana here. If you’d like to learn more about Solana and the core innovations that make it the world’s first web-scale blockchain, this blog post is a great place to start. 

Categories
Blog

Cloudbreak — Solana’s Horizontally Scaled State Architecture

Solana is the most performant permissionless blockchain in the world. On current iterations of the Solana Testnet, a network of 200 physically distinct nodes supports a sustained throughput of more than 50,000 transactions per second when running with GPUs. Achieving as such requires the implementation of several optimizations and new technologies, and the result is a breakthrough in network capacity that signals a new phase in blockchain development.

There are 8 key innovations that make the Solana network possible:

In this blog post, we’ll go over Cloudbreak, Solana’s horizontally scaled state architecture.

Overview: RAM, SSDs, and Threads

When scaling a blockchain without sharding, it is not enough to only scale computation. The memory that is used to keep track of accounts quickly becomes a bottleneck in both size and access speeds. For example: It’s generally understood that LevelDB, the local database engine that many modern chains use, cannot support more than about 5,000 TPS on a single machine. That’s because the virtual machine is unable to exploit concurrent read and write access to the account state through the database abstractions.

A naive solution is to maintain the global state in RAM. However, it’s not reasonable to expect consumer-grade machines to have enough RAM to store the global state. The next option is using SSDs. While SSDs reduce the cost per byte by a factor of 30x or more, they are 1000x slower than RAM. Below is the datasheet from the latest Samsung SSD, which is one of the fastest SSDs on the market.

Samsung SSD Datasheet

A single-spend transaction needs to read 2 accounts and write to 1. Account keys are cryptographic public keys, and are totally random and have no real data locality. A user’s wallet will have many Account addresses, and the bits of each address are completely unrelated to any other address. Because there is no locality between accounts, it is impossible for us to place them in memory such that they are likely to be close to each other.

With a max of 15,000 unique reads per second, a naive single-threaded implementation of an Accounts database using a single SSD will support up to 7,500 transactions per second. Modern SSDs support 32 concurrent threads, therefore and can, therefore, support 370,000 reads per second, or roughly 185,000 transactions per second.

Cloudbreak

The guiding design principle at Solana is to design software that gets out of the way of the hardware to allow 100% utilization.

Organizing the database of accounts such that concurrent reads and writes are possible between the 32 threads is a challenge. Vanilla open source databases like LevelDB cause bottlenecking because they don’t optimize for this specific challenge in a blockchain setting. Solana does not use a traditional database to solve these problems. Instead, we use several mechanisms utilized by operating systems.

First, we leverage memory-mapped files. A memory-mapped file is a file whose bytes are mapped into the virtual address space of a process. Once a file has been mapped, it behaves like any other memory. The kernel may keep some or none of the memory cached in the RAM, but the amount of physical memory is limited by the size of the disk and not the RAM. Reads and writes are still obviously bound by the performance of the disk.

The second important design consideration is that sequential operations are much faster than random operations. This is true not just for SSDs, but for the entire virtual memory stack. CPUs are great at prefetching memory that is accessed sequentially, and operating systems are great at handling sequential page faults. To exploit this behavior we break up the accounts data structure roughly as follows:

  1. The index of accounts and forks is stored in RAM.
  2. Accounts are stored in memory-mapped files up to 4MB in size.
  3. Each memory map only stores accounts from a single proposed fork.
  4. Maps are randomly distributed across as many SSDs as are available.
  5. Copy-on-write semantics are used.
  6. Writes are appended to a random memory map for the same fork.
  7. The index is updated after each write is completed.

Since account updates are copy-on-write and are appended to a random SSD, Solana receives the benefits of sequential writes and horizontal scaling of the writes across many SSDs for concurrent transactions. Reads are still random access, but since any given forks state updates are spread across many SSDs, the reads end up horizontally scaled as well.

Cloudbreak also performs a form of garbage collection. As forks become finalized beyond rollback and accounts are updated, old invalid accounts are garbage collected, and memory is relinquished.

There’s at least one more great benefit of this architecture: computing the Merkle root of the state updates for any given fork can be done with sequential reads that are horizontally scaled across SSDs. The drawback of this approach is the loss of generality to the data. Since this is a custom data structure, with custom layout, we are unable to use general-purpose database abstractions for querying and manipulating the data. We had to build everything from the ground up. Fortunately, that’s done now.

Benchmarking Cloudbreak

While the Accounts database is in RAM, we see throughput that matches RAM access times, while scaling with the number of available cores. At 10m accounts, the database no longer fits in RAM. However, we still see performance near 1m in reads or writes per second on a single SSD.

Learn more about Tour de SOL — Solana’s incentivized testnet event.

Solana’s utilization of Cloudbreak, alongside innovations like Proof of History, Sealevel, and Tower BFT combine to create the world’s first web-scale blockchain. Solana’s testnet is live today. You can see it at https://testnet.blog.solana.com. For cost purposes, we are only running a handful of nodes. However, we have spun it up on many instances to over 200 physically distinct nodes (not on shared hardware) across 23 data centers on AWS, GCE, and Azure for benchmarking.

The runtime is functioning today, and developers can deploy code on the testnow now. Developers can build smart contracts in C today, and we are aggressively working on the Rust toolchain. Rust will be the flagship language for Solana smart contract development. The Rust toolchain is publicly available as part of the Solana Javascript SDK, and we are further iterating on the Software Development Kit.

Solana will soon launch a public beta incentivizing validators to run nodes via Tour de SOL — analogous to Cosmos’ Game of Stakes — that challenges the public at large to test the limits of the Solana network while earning tokens for doing so.