This article hopes to explain in a clear and comprehensive manner what Plasma Cash is and how it helps solve Ethereum’s scalability issues. It will also explain the trade-offs made when using Plasma. This post is aimed at both beginner and intermediate readers.
Plasma Cash is a scaling solution for the Ethereum blockchain. It makes use of sidechains for faster and cheaper transactions, and periodically checkpoints its state to the main Ethereum network to utilise the security and decentralisation of the main network.
Plasma Cash is one variant of the Plasma framework. Other variants include Plasma Debit and Plasma MVP. All are somewhat similar.
The Ethereum blockchain can process ~10–28 transactions per second (TPS). When the network is congested users have to wait longer for their transactions to be processed. Even when the network isn’t overloaded, Ethereum transactions are slow and expensive.
The 28 TPS limit comes from Ethereum’s gas limit and how quickly blocks are processed. In Ethereum, each transaction uses a certain amount of gas. A simple transaction such as sending someone ether costs 21,000 gas. More complex smart contract transactions can cost millions of gas.
Ethereum will allow a maximum of ~8 million gas per block and it takes an average of ~14 seconds for a block to be mined. This means that each block can contain a maximum of ~400 transactions (~8,000,000 / 21,000). This gives us a maximum throughput of ~28 TPS (400 / 14).
When performing a transaction one must pay for the gas used. This fee is paid to the miner of the transaction. One may choose any gas price they wish, but if not properly incentivised, a miner will not include the transaction in a block. Moreover, if the network is congested and a block is full, a miner must choose which transactions to include. The miner will likely choose the transactions paying more for gas. In general, the more one pays for gas, the faster the transaction will be processed.
Today the gas price is usually ~1–2 Gwei (10^-9 ether). This means a simple transaction that uses 21,000 gas will cost the sender ~$0.01 ($200 * 21,000 * 2 * 10^-9, assuming the price of ether is $200 and a gas price of 2 Gwei). At the start of 2018, the price of ether was $1,400, and at peak times one would have to pay 50+ Gwei for a transaction to be processed within a reasonable amount of time. Under these conditions, a simple transaction costs $3.50 and a complex smart contract transaction costs over $20. Quite a price to pay to breed your CryptoKitties.
Finding a Solution
There are a few different ways to approach these problems. Some proposed solutions make changes to the core Ethereum protocol. These solutions are known as layer 1 solutions.
Other approaches build on top of the existing infrastructure and take some of the processing off-chain. Solutions of this type are known as layer 2 solutions.
Plasma Cash is a layer 2 scaling solution.
It makes use of two key insights:
- Multiple blockchains can handle greater throughput than a single blockchain. If one blockchain can handle 25 TPS, then 10 blockchains can handle 250 TPS.
- It is possible to handle far greater throughput if we aren’t as strict as Ethereum on security and decentralisation. Ethereum uses Proof of Work. If we don’t, we can handle a far greater load.
The challenge however is to ensure things stay secure, trustless and decentralised. Proof of work exists for a reason.
One can create a system where all one needs to do to transfer value to another is sign a transaction with a private key. The problem with such a system is that a single user can sign two transactions making the same payment to two different people?—?the “double spend” problem. Solving this problem in a trustless and decentralised manner was not simple. Bitcoin, and subsequently Ethereum, solved this problem using Proof of Work.
If we create a sidechain that does not use Proof of Work or does not care as much about decentralisation it will be able to handle far greater throughput, but what will prevent double spends? Is there a way we can utilise the main Ethereum network to avoid double spends, while using a sidechain to perform fast and cheap transactions?
The Plasma Cash Solution
Plasma Cash builds on the above principles and makes use of two blockchains: the main Ethereum chain and a Plasma sidechain. The mainchain provides the security for the system. The sidechain provides fast and cheap transactions.
The Plasma sidechain is able to use any consensus mechanism desired and can even be fully centralised. As it is able to use any consensus mechanism it can handle far greater throughput?—?potentially thousands of transactions per second.
Coins can be transferred between the mainchain and the sidechain and back. One can transfer funds from the mainchain to the sidechain, then transfer those funds to someone else on the sidechain, and that person to others and so on. Eventually someone may decide to cash out the funds from the sidechain making them available once again in their mainchain account.
One question that arises is how the mainchain knows the true owner of a coin. If Alice transfers funds to Bob on the sidechain, and Alice and Bob now both try to withdraw it to the mainchain, how does the mainchain know which withdrawal is valid? What if Charles also tries to cash out this coin claiming that Bob sent him the funds afterwards?
A naive solution is to have the sidechain update the mainchain each time a transaction occurs. The mainchain can now check the true owner with ease and release the locked funds to the true owner. But this solution has a clear flaw. If each transaction that happens on the sidechain is sent to the mainchain the system will still be slow and expensive and why use a sidechain at all?
A better solution is to batch many sidechain transactions together and then submit a compressed version of this batch to the mainchain in one go. This is the Plasma Cash solution.
For example, 1,000 transactions happen in the span of a minute on the sidechain. The sidechain operator then groups all 1,000 transactions together, compresses the group, and submits the compressed data to the mainchain.
Using Merkle trees we can compress a large dataset to an extremely small size. We actually lose a lot of data when performing this compression (what is known as a lossy compression algorithm), and this is how we are able to compress the data to be so small. But the magic of Merkle trees means we still have enough information to verify the true owner of the coin. We will cover the magic of Merkle trees in greater detail below.
Using this system, instead of being limited to 400 transactions per Ethereum block, we can now handle 400 * 1,000 = 400,000 transactions per block. We can handle even more if each batch contains over 1,000 transactions.
Now when Bob wishes to withdraw, the mainchain smart contract can check the compressed data to verify that Bob is the true owner of the funds.
We mentioned that the sidechain can use any consensus mechanism which is why it can handle so many transactions per second. Does one have to trust the sidechain operator?
The job of the sidechain operator is collect transactions, batch them together, and submit them to the mainchain. The operator cannot sign a transaction for someone else as they do not own the private key to the account. The operator is not in charge of withdrawing on the mainchain either. This is the user’s job and the sidechain operator cannot prevent a withdrawal request.
A sidechain operator could however decide to submit a double spend transaction (for example, submit two transactions, one where Alice sent a coin to Bob, and another where Alice sent the same coin to Charles). The operator could also decide to censor transactions (for example, Alice sent a coin to Bob, but the operator never puts this transaction into a block submitted to the mainchain).
Both of the above situations are possible, but each can be handled.
In the scenario where the sidechain operator submits a double spend to the mainchain, the transaction that was submitted first is the one that is considered valid. If the signed transaction from Alice to Bob is submitted first, then Bob will be the one that can cash out on the mainchain.
Charles needs to keep an eye on all the blocks submitted to the mainchain. If he sees that a block containing Alice’s transfer to Bob has already been submitted to the mainchain, then he will know that Alice’s transaction to him is invalid.
In a scenario where the sidechain operator decides to censor transactions, the coin owner will have to withdraw their funds themselves. For example, if Alice wishes to send Bob a coin, but the transaction is never submitted to the mainchain, Bob should consider the transaction not to have happened. If Alice wishes to send the coin to Bob in any case, she should withdraw the coin to the mainchain and then send him the funds there. This is clearly an annoyance, but the point is that the operator cannot stop Alice transferring funds to Bob. If the sidechain operator wishes to maintain a good reputation, they will not censor transactions, and realise that censorship is futile in any case. The worst they can do is cause some slight annoyance to a user.
If you’d like to read more about exit scenarios I recommend this article by Karl Floersch:
We have explained the core pieces of Plasma Cash. We will now dive into further detail covering some of the items we passed over above.
When we talk about transferring funds from the mainchain to the sidechain, what happens in practice is that the funds get locked in a smart contract on the mainchain. The sidechain then creates a coin with that same value. Once someone decides to cash out a coin from the sidechain, the coin is destroyed on the sidechain and after a certain waiting period ends, the funds are unlocked on the mainchain and sent to withdrawer’s account.
Each coin is given a unique identification number. A coin cannot be broken down into smaller pieces. Nor can two separate coins be combined into a single coin. This is similar to physical coins or banknotes. A $10 bill cannot be combined with another $10 bill to form a $20 bill. Nor can one break a $10 bill into two $5 bills. If one would like to pay someone $5, but only has a $10 bill, they will have to ask for $5 in change. The same is true with Plasma Cash coins.
A user can transfer ether to another user on the sidechain by signing a transaction and submitting it to the sidechain operator.
The sidechain operator groups transactions into blocks. A sparse Merkle tree is created with the block transactions as the leaves of the Merkle tree. A transaction for coin i will be leaf i of the tree. For example, a transaction involving the coin with id 10, will be the 10th leaf of the tree. (More on Merkle trees below).
The sidechain operator submits the Merkle root of each block to the mainchain smart contract. It is up to the sidechain operator to decide how often they submit blocks.
When a user wishes to withdraw from the sidechain they submit a withdrawal request which includes the transaction in which they received the coin as well as the Merkle proof. Using the Merkle proof, the smart contract can verify that this transaction was indeed included in a sidechain block that was previously submitted by the sidechain operator. The smart contract also checks the account that signed the sidechain transaction.
When a withdrawal request is submitted a challenge period begins (one week for example). During this period anyone can challenge to say a withdrawal request is invalid. One does this by submitting a proof for the true owner of the coin. For example, if Alice sent a coin to Bob in block 3, and sent the same coin to Charles in block 4, then if Charles tries to withdraw the coin, someone can submit a proof that coin was in fact transferred to someone else in an earlier block cancelling Charles invalid withdrawal.
Withdrawal requests and challenges require users to stake funds. If a withdrawal request or challenge are found to be fraudulent, these funds are slashed, disincentivising people from trying to cheat the system.
If there are no successful challenges, then after the waiting period has ended, the funds are available for release into the withdrawer’s account.
One item we skipped over above was Merkle trees and proofs. Below is a more detailed explanation to fill in the missing pieces:
A Merkle tree is a tamper-resistant data structure that allows a large amount of data to be compressed into a single hash and can be queried for the presence of specific elements in the data with a proof constructed in logarithmic space.
A Merkle tree looks like this:
Each rectangle in the above image is called a node. A binary Merkle tree has a root node (the node at the top of the image) with two branches coming off it pointing to another two nodes. Each node in the tree has two child nodes until we reach the bottom of the tree. The above image has four nodes on the bottom level, but you could have a tree of depth 10 with 1024 nodes at the bottom.
A Merkle tree is a compressed version of some data where the data is split into a certain number of parts, with each leaf node being a hash of one of these parts. Each parent node is the hash of its two child nodes concatenated together.
So for example, in the graphic above we have our data which is split into 4 data blocks: L1, L2, L3 and L4. We have 4 leaf nodes that are a hash of each part respectively. So node 0–0 is the hash of L1, node 0–1 is the hash of L2 and so on. Then node 0 is the hash of 0–0 and 0–1 combined and the root node is the hash of node 0 and node 1.
In our case, each hash is a cryptographic hash which means you cannot work out what the input to the hash function is from the output. This is similar to how public and private keys work. Performing a cryptographic hash on a private key will give you a public key, but knowing the public key alone you won’t be able to work out what the private key is.
What we get from all this is a compressed version of our data. From the root node you won’t be able to work out what the data blocks are, but if you have the root node and the data you’ll be able to verify that this root node is indeed the Merkle root of that data.
The data in our example is L1, L2, L3, L4. This could be the four words “Hello I am Satoshi” (I’m not ;)) or it could be paragraphs of text or images or a video. Whatever you like.
Torrents are another example that makes heavy use of Merkle trees. When downloading something via torrent you receive lots of small pieces of data from lots of different computers. Somebody may decide to send you malicious code. How does your torrent client verify that it is receiving the pieces of data it expects? It does this by downloading the Merkle root of the data from a trusted source. This is a very small piece of data, but with it you can verify that the 1GB movie you just downloaded is what you’re expecting. If someone sends you an invalid piece of data, your torrent client will ask another computer in the network for the correct piece of data till the hashes match up with the root node.
A nice property of Merkle trees is that you don’t need to download the whole tree to verify parts of it. You just need to verify a path going up the tree. In our example, I can verify that L2 is part of the tree if someone sends me node 0–0 and node 1. I can then do all the hashing I need to verify the root node is as expected. I don’t need to be sent any other nodes in the tree to do this. This process is known as a Merkle proof?—?proving that a node is in a tree by only sending a single branch of the tree.
If you’d like to read more about Merkle Trees I recommend this article:
Merkle Trees in Plasma Cash
In Plasma Cash, the sidechain operator publishes the Merkle root of each block to the mainchain. If a user wants to prove that their transaction was included in a block, they send the Merkle proof to the smart contract which verifies that the transaction was in fact included in a block.
In Plasma Cash, the leaves of the Merkle tree represent transactions. If a transaction for coin i occurs, it is placed as leaf i of the tree. For example, if Alice sends the coin with id 17 to Bob, this transaction is placed in leaf L17 in the tree. If no transaction for coin 17 is made in the block then L17 is just 0. Only one transaction for a coin can be made in a single block. If Bob would like to transfer coin 17 to Charles, then this transaction will be in the next block or later.
Problems with Plasma Cash
Plasma Cash does a lot to help scale the Ethereum network, but it still has some problems.
One problem is that one cannot send arbitrary amounts of money. Coins cannot be split or merged, so if Alice only has a 5 ether coin and Bob only has a 1 ether coin, then Alice will not be able to send Bob 3 ether as he doesn’t have correct change.
One solution to this is to have a change maker. Alice would send the change maker her 5 ether coin and would receive a 3 and a 2 ether coin in return and would then be able to pay Bob.
Another solution is Plasma Debit that allows for arbitrarily sized payments. It works in a similar way to Plasma Cash, but with a slightly different Merkle tree where the amount stored in a coin can fluctuate allowing for arbitrarily sized payments. You can learn more about Plasma Debit here:
Another item that needs to be dealt with is watching for fraudulent exits. Plasma Cash is secure assuming you can challenge fraudulent exits, but if you aren’t watching or paying attention, your money can be stolen. One solution is to have a service watching the network on your behalf, but you would then have to trust the service to do its job and challenge in case someone tries to steal your coins. Having multiple watch services running that are all financially incentivised to stop fraudulent exits on large networks may be a good enough solution.
A third issue is the finality of transactions. If you’ve been sent ether on the sidechain you cannot consider the transaction final until the sidechain operator has submitted a Merkle root for a block containing the transaction to the mainchain. This means that the fastest a block can be processed is the speed at which the mainchain can process a block. If the sidechain operator publishes blocks every hour, then you need to wait up to a full hour for the transaction to be considered final.
If you are looking for quick finality and don’t mind paying a small amount of ether for the transaction to be processed, then processing the transaction on the mainchain is still your best bet. If you are willing to trust the sidechain operator to publish blocks then sidechains do have their benefits. In a situation where networks are congested and sidechain operators are publishing often to the mainchain, this is where the greatest benefits of sidechains will be felt.
Plasma Cash uses a clever system involving sidechains and Merkle roots to scale the Ethereum network. It is still early days for Plasma with the first implementations being completed now.
I work as a full stack and blockchain freelance developer. Feel free to reach out in the comments.