Getting Deep Into EVM: How Ethereum Works Backstage

EVM: 10,000 ft Perspective

Before diving into understanding how EVM works and seeing it working via code examples, let’s see where EVM fits in the Ethereum and what are its components. Don’t get scared by these diagrams because as soon as you are done reading this article you will be able to make a lot of sense out of these diagrams.

The below diagram shows where EVM fits into Ethereum.

The below diagram shows basic Architecture of EVM.

This below diagram shows how different parts of EVM interact with each other to make Ethereum do its magic.

We have seen what EVM looks like. Now it’s time to start understanding how these parts play a significant role in the way Ethereum works.

Ethereum Contracts

Basics

Smart contracts are just computer programs, and we can say that Ethereum contracts are smart contracts that run on the Ethereum Virtual Machine. The EVM is the sandboxed runtime and a completely isolated environment for smart contracts in Ethereum. This means that every smart contract running inside the EVM has no access to the network, file system, or other processes running on the computer hosting the VM.

As we already know, there are two kinds of accounts: contracts and external accounts. Every account is identified by an address, and all accounts share the same address space. The EVM handles addresses of 160-bit length.

Every account consists of a balance, a nonce, bytecode, and stored data (storage). However, there are some differences between these two kinds of accounts. For instance, the code and storage of external accounts are empty, while contract accounts store their bytecode and the merkle root hash of the entire state tree. Moreover, while external addresses have a corresponding private key, contract accounts don’t. The actions of contract accounts are controlled by the code they host in addition to regular cryptographic signing of every Ethereum transaction.

Creation

The creation of a contract is simply a transaction in which the receiver address is empty and its data field contains the compiled bytecode of the contract to be created (this makes sense?—?contracts can create contracts too). Let’s look at a quick example. Please open the directory of exercise 1; in it you will find a contract called MyContract with the following code:

pragma solidity ^0.4.21;
contract MyContract {
event Log(address addr);
function MyContract() public {
emit Log(this);
}
function add(uint256 a, uint256 b) public pure returns (uint256) {
return a + b;
}
}

Let’s open a truffle console in develop mode running truffle develop. Once inside, follow the subsequent commands to deploy an instance of MyContract:

truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: MyContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)

We can check that our contract has been deployed successfully by running the following code:

truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> myContract = new MyContract(receipt.contractAddress)
truffle(develop)> myContract.add(10, 2)
{ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }

Let’s go deeper to analyze what we just did. The first thing that happens when a new contract is deployed to the Ethereum blockchain is that its account is created.¹ As you can see, we logged the address of the contract in the constructor in the example above. You can confirm this by checking that receipt.logs[0].data is the address of the contract-padded 32 bytes and that receipt.logs[0].topics is the keccak-256 hash of the string “Log(address)”.

What happens backstage when you call a function in your contract

As the next step, the data sent in with the transaction is executed as bytecode. This will initialize the state variables in storage, and determine the body of the contract being created. This process is executed only once during the lifecycle of a contract. The initialization code is not what is stored in the contract; it actually produces as its return value the bytecode to be stored. Bear in mind that after a contract account has been created, there is no way to change its code.²

Given the fact that the initialization process returns the code of the contract’s body to be stored, it makes sense that this code isn’t reachable from the constructor logic. For example, let’s take a look at the Impossible contract of exercise 1:

contract Impossible {
function Impossible() public {
this.test();
}
function test() public pure returns(uint256) {
return 2;
}
}

If you try to compile this contract, you will get a warning saying you’re referencing this within the constructor function, but it will compile. However, if you try to deploy a new instance, it will revert. This is because it makes no sense to attempt to run code that is not stored yet.³ On the other hand, we were able to access the address of the contract: the account exists, but it doesn’t have any code yet.

However, a code execution can produce other events, such as altering the storage, creating further accounts, or making further message calls. For example, let’s take a look at the AnotherContract code:

contract AnotherContract {
MyContract public myContract;
function AnotherContract() public {
myContract = new MyContract();
}
}

Let’s see how it works running the following commands inside a truffle console:

truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: AnotherContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)
truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> anotherContract = AnotherContract.at(receipt.contractAddress)
truffle(develop)> anotherContract.myContract().then(a => myContractAddress = a)
truffle(develop)> myContract = MyContract.at(myContractAddress)
truffle(develop)> myContract.add(10, 2)
{ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }

Additionally, contracts can be created using the CREATE opcode, which is what the Solidity new construct compiles down to. Both alternatives work the same way. Let’s continue exploring how message calls work.

Message Calls

Contracts can call other contracts through message calls. Every time a Solidity contract calls a function of another contract, it does so by producing a message call. Every call has a sender, a recipient, a payload, a value, and an amount of gas. The depth of message call is limited to less than 1024 levels.

Solidity provides a native call method for the address type that works as follows:

address.call.gas(gas).value(value)(data)

gas is the amount of gas to be forwarded, address is the address to be called, value is the amount of Ether to be transferred in wei, and data is the payload to be sent. Bear in mind that value and gas are optional parameters here, but be careful because almost all the remaining gas of the sender will be sent by default in a low-level call.

gas expenditure map

As you can see, each contract can decide the amount of gas to be forwarded in a call. Given that every call can end in an out-of-gas (OOG) exception, to avoid security issues at least 1/64th of the sender’s remaining gas will be saved. This allows senders to handle inner calls’ out-of-gas errors, so that they are able to finish its execution without themselves running out of gas, and thus bubbling the exception up.

EVM exception map

Let’s take a look at the Caller contract of exercise 2:

contract Implementation {
event ImplementationLog(uint256 gas);
function() public payable {
emit ImplementationLog(gasleft());
assert(false);
}
}
contract Caller {
event CallerLog(uint256 gas);
Implementation public implementation;

function Caller() public {
implementation = new Implementation();
}

function () public payable {
emit CallerLog(gasleft());
implementation.call.gas(gasleft()).value(msg.value)(msg.data);
emit CallerLog(gasleft());
}
}

The Caller contract has only a fallback function that redirects every received call to an instance of Implementation. This instance simply throws through an assert(false) on every received call, which will consume all the gas given. Then, the idea here is to log the amount of gas in Caller before and right after forwarding a call to Implementation. Let’s open a truffle console and see what happens:

truffle(develop)> compile
truffle(develop)> Caller.new().then(i => caller = i)
truffle(develop)> opts = { gas: 4600000 }
truffle(develop)> caller.sendTransaction(opts).then(r => result = r)
truffle(develop)> logs = result.receipt.logs
truffle(develop)> parseInt(logs[0].data) //4578955
truffle(develop)> parseInt(logs[1].data) //71495

As you can see, 71495 is approximately the 64th part of 4578955. This example clearly demonstrates that we can handle an OOG exception from an inner call.

Solidity also provides the following opcode, allowing us to manage calls with inline assembly:

call(g, a, v, in, insize, out, outsize)

Where g is the amount of gas to be forwarded, a is the address to be called, v is the amount of Ether to be transferred in wei, in states the memory position of insize bytes where the call data is held, and out and outsizestate where the return data will be stored in memory. The only difference is that an assembly call allows us to handle return data, while the function will only return 1 or 0 whether it failed or not.

The EVM supports a special variant of a message call called delegatecall. Once again, Solidity provides a built-in address method in addition to an inline assembly version of it. The difference with a low-level call is that the target code is executed within the context of the calling contract, and msg.sender and msg.value do not change.?

Let’s analyze the following example to understand better how a delegatecallworks. Let’s start with the Greeter contract:

contract Greeter {
event Thanks(address sender, uint256 value);
function thanks() public payable {
emit Thanks(msg.sender, msg.value);
}
}

As you can see, the Greeter contract simply declares a thanks function that emits an event carrying the msg.value and msg.sender data. We can try this method by running the following lines in a truffle console:

truffle(develop)> compile
truffle(develop)> someone = web3.eth.accounts[0]
truffle(develop)> ETH_2 = new web3.BigNumber(‘2e18’)
truffle(develop)> Greeter.new().then(i => greeter = i)
truffle(develop)> opts = { from: someone, value: ETH_2 }
truffle(develop)> greeter.thanks(opts).then(tx => log = tx.logs[0])
truffle(develop)> log.event //Thanks
truffle(develop)> log.args.sender === someone //true
truffle(develop)> log.args.value.eq(ETH_2) //true

Now that we have confirmed its functionality, let’s pay attention to the Wallet contract:

contract Wallet {
Greeter internal greeter;

function Wallet() public {
greeter = new Greeter();
}

function () public payable {
bytes4 methodId = Greeter(0).thanks.selector;
require(greeter.delegatecall(methodId));
}
}

This contract only defines a fallback function that executes the Greeter#thanks method through a delegatecall. Let’s see what happens when we call Greeter#thanks through the Wallet contract:

truffle(develop)> Wallet.new().then(i => wallet = i)
truffle(develop)> wallet.sendTransaction(opts).then(r => tx = r)
truffle(develop)> logs = tx.receipt.logs
truffle(develop)> SolidityEvent = require(‘web3/lib/web3/event.js’)
truffle(develop)> Thanks = Object.values(Greeter.events)[0]
truffle(develop)> event = new SolidityEvent(null, Thanks, 0)
truffle(develop)> log = event.decode(logs[0])
truffle(develop)> log.event // Thanks
truffle(develop)> log.args.sender === someone // true
truffle(develop)> log.args.value.eq(ETH_2) // true

As you may have noticed, we have just confirmed that the delegatecallfunction preserves the msg.value and msg.sender .

This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address. This makes it possible to implement the ‘library’ feature in Solidity.” ?

There is one more thing we should explore about delegatecalls. As mentioned above, the storage of the calling contract is the one being accessed by the executed code. Let’s see the Calculator contract:

contract ResultStorage {
uint256 public result;
}
contract Calculator is ResultStorage {
Product internal product;
Addition internal addition;
function Calculator() public {
product = new Product();
addition = new Addition();
}

function add(uint256 x) public {
bytes4 methodId = Addition(0).calculate.selector;
require(addition.delegatecall(methodId, x));
}

function mul(uint256 x) public {
bytes4 methodId = Product(0).calculate.selector;
require(product.delegatecall(methodId, x));
}
}
contract Addition is ResultStorage {
function calculate(uint256 x) public returns (uint256) {
uint256 temp = result + x;
assert(temp >= result);
result = temp;
return result;
}
}
contract Product is ResultStorage {
function calculate(uint256 x) public returns (uint256) {
if (x == 0) result = 0;
else {
uint256 temp = result * x;
assert(temp / result == x);
result = temp;
}
return result;
}
}

The Calculator contract has just two functions: add and product. The Calculator contract doesn’t know how to add or multiply; it delegates those calls to the Addition and Product contracts respectively instead. However, all these contracts share the same state variable result to store the result of each calculation. Let’s see how this works:

truffle(develop)> Calculator.new().then(i => calculator = i)
truffle(develop)> calculator.addition().then(a => additionAddress=a)
truffle(develop)> addition = Addition.at(additionAddress)
truffle(develop)> calculator.product().then(a => productAddress = a)
truffle(develop)> product = Product.at(productAddress)
truffle(develop)> calculator.add(5)
truffle(develop)> calculator.result().then(r => r.toString()) // 5
truffle(develop)> addition.result().then(r => r.toString()) // 0
truffle(develop)> product.result().then(r => r.toString()) // 0
truffle(develop)> calculator.mul(2)
truffle(develop)> calculator.result().then(r => r.toString()) // 10
truffle(develop)> addition.result().then(r => r.toString()) // 0
truffle(develop)> product.result().then(r => r.toString()) // 0

We have just confirmed that we are using the storage of the Calculatorcontract. Besides that, the code being executed is stored in the Addition and in the Product contracts.

Additionally, as for the call function, there is a Solidity assembly opcode version for delegatecall. Let’s take a look at the Delegator contract to see how we can use it:

contract Implementation {
event ImplementationLog(uint256 gas);
function() public payable {
emit ImplementationLog(gasleft());
assert(false);
}
}
contract Delegator {
event DelegatorLog(uint256 gas);
Implementation public implementation;
function Delegator() public {
implementation = new Implementation();
}
function () public payable {
emit DelegatorLog(gasleft());
address _impl = implementation;
assembly {
let ptr := mload(0x40)
calldatacopy(ptr, 0, calldatasize)
let result := delegatecall(gas, _impl, ptr, calldatasize, 0, 0)
}

emit DelegatorLog(gasleft());
}
}

This time we are using inline assembly to execute the delegatecall. As you may have noticed, there is no value argument here, since msg.value will not change. You may be wondering why we are loading the 0x40 address, or what calldatacopy and calldatasize are. Don’t panic?—?we will describe them in the next post of the series. In the meantime, feel free to run the same commands over a truffle console to validate its behavior.

Once again, it’s important to understand clearly how a delegatecall works. Every triggered call will be sent from the current contract and not the delegate-called contract. Additionally, the executed code can read and write to the storage of the caller contract. If not implemented properly, even smallest of your mistakes can lead to losses in millions. Here is a list of most expensive mistakes in the history of Ethereum.

Data Management

The EVM manages different kinds of data depending on their context, and it does that in different ways. We can distinguish at least four main types of data: stack, calldata, memory, and storage, besides the contract code. Let’s analyze each of these:

Stack

The EVM is a stack machine, meaning that it doesn’t operate on registers but on a virtual stack. The stack has a maximum size of 1024. Stack items have a size of 256 bits; in fact, the EVM is a 256-bit word machine (this facilitates Keccak256 hash scheme and elliptic-curve computations). Here is where most opcodes consume their parameters from.

The EVM provides many opcodes to modify the stack directly. Some of these include:

  • POP removes item from the stack.
  • PUSHn places the following n bytes item in the stack, with n from 1 to 32.
  • DUPn duplicates the nth stack item, with n from 1 to 32.
  • SWAPn exchanges the 1st and nth stack item, with n from 1 to 32.

Calldata

The calldata is a read-only byte-addressable space where the data parameter of a transaction or call is held. Unlike the stack, to use this data you have to specify an exact byte offset and number of bytes you want to read.

The opcodes provided by the EVM to operate with the calldata include:

  • CALLDATASIZE tells the size of the transaction data.
  • CALLDATALOAD loads 32 bytes of the transaction data onto the stack.
  • CALLDATACOPY copies a number of bytes of the transaction data to memory.

Solidity also provides an inline assembly version of these opcodes. These are calldatasize, calldataload and calldatacopy respectively. The last one expects three arguments (t, f, s): it will copy s bytes of calldata at position f into memory at position t. In addition, Solidity lets you access to the calldata through msg.data.

As you may have noticed, we used some of these opcodes in some examples of the previous post. Let’s take a look at the inline assembly code block of a delegatecall again:

assembly {
let ptr := mload(0x40)
calldatacopy(ptr, 0, calldatasize)
let result := delegatecall(gas, _impl, ptr, calldatasize, 0, 0)
}

To delegate the call to the _impl address, we must forward msg.data. Given that the delegatecall opcode operates with data in memory, we need to copy the calldata into memory first. That’s why we use calldatacopy to copy all the calldata to a memory pointer (note that we are using calldatasize).

Let’s analyze another example using calldata. You will find a Calldatacontract in the exercise 3 folder with the following code:

contract Calldata {
function add(uint256 _a, uint256 _b) public view 
returns (uint256 result)
{
assembly {
let a := mload(0x40)
let b := add(a, 32)
calldatacopy(a, 4, 32)
calldatacopy(b, add(4, 32), 32)
result := add(mload(a), mload(b))
}
}
}

The idea here is to return the addition of two numbers passed by arguments. As you can see, once again we are loading a memory pointer reading from 0x40, but please ignore that for now; we will explain it right after this example. We are storing that memory pointer in the variable a and storing in b the following position which is 32-bytes right after a. Then we use calldatacopy to store the first parameter in a. You may have noticed we are copying it from the 4th position of the calldata instead of its beginning. This is because the first 4 bytes of the calldata hold the signature of the functionbeing called, in this case bytes4(keccak256("add(uint256,uint256)")); this is what the EVM uses to identify which function has to be executed on a call. Then, we store the second parameter in b copying the following 32 bytes of the calldata. Finally, we just need to calculate the addition of both values loading them from memory.

You can test this yourself with a truffle console running the following commands:

truffle(develop)> compile
truffle(develop)> Calldata.new().then(i => calldata = i)
truffle(develop)> calldata.add(1, 6).then(r => r.toString()) // 7

Memory

Memory is a volatile read-write byte-addressable space. It is mainly used to store data during execution, mostly for passing arguments to internal functions. Given this is volatile area, every message call starts with a cleared memory. All locations are initially defined as zero. As calldata, memory can be addressed at byte level, but can only read 32-byte words at a time.

Memory is said to “expand” when we write to a word in it that was not previously used. Additionally to the cost of the write itself, there is a cost to this expansion, which increases linearly for the first 724 bytes and quadratically after that.

The EVM provides three opcodes to interact with the memory area:

  • MLOAD loads a word from memory into the stack.
  • MSTORE saves a word to memory.
  • MSTORE8 saves a byte to memory.

Solidity also provides an inline assembly version of these opcodes.

There is another key thing we need to know about memory. Solidity always stores a free memory pointer at position 0x40, i.e. a reference to the first unused word in memory. That’s why we load this word to operate with inline assembly. Since the initial 64 bytes of memory are reserved for the EVM, this is how we can ensure that we are not overwriting memory that is used internally by Solidity. For instance, in the delegatecall example presented above, we were loading this pointer to store the given calldata to forward it. This is because the inline-assembly opcode delegatecall needs to fetch its payload from memory.

Additionally, if you pay attention to the bytecode output by the Solidity compiler, you will notice that all of them start with 0x6060604052…, which means:

PUSH1 : EVM opcode is 0x60
0x60 : The free memory pointer
PUSH1 : EVM opcode is 0x60
0x40 : Memory position for the free memory pointer
MSTORE : EVM opcode is 0x52

You must be very careful when operating with memory at assembly level. Otherwise, you could overwrite a reserved space.

Storage

Storage is a persistent read-write word-addressable space. This is where each contract stores its persistent information. Unlike memory, storage is a persistent area and can only be addressed by words. It is a key-value mapping of 2²?? slots of 32 bytes each. A contract can neither read nor write to any storage apart from its own. All locations are initially defined as zero.

The amount of gas required to save data into storage is one of the highest among operations of the EVM. This cost is not always the same. Modifying a storage slot from a zero value to a non-zero one costs 20,000. While storing the same non-zero value or setting a non-zero value to zero costs 5,000. However, in the last scenario when a non-zero value is set to zero, a refund of 15,000 will be given.

The EVM provides two opcodes to operate the storage:

  • SLOAD loads a word from storage into the stack.
  • SSTORE saves a word to storage.

These opcodes are also supported by the inline assembly of Solidity.

Solidity will automatically map every defined state variable of your contract to a slot in storage. The strategy is fairly simple?—?statically sized variables (everything except mappings and dynamic arrays) are laid out contiguously in storage starting from position 0.

For dynamic arrays, this slot (p) stores the length of the array and its data will be located at the slot number that results from hashing p(keccak256(p)). For mappings, this slot is unused and the value corresponding to a key k will be located at keccak256(k,p). Bear in mind that the parameters of keccak256 (k and p) are always padded to 32 bytes.

Let’s take a look at a code example to understand how this works. Inside the exercise 3 contracts folder you will find a Storage contract with the following code:

contract Storage {
uint256 public number;
address public account;
uint256[] private array;
mapping(uint256 => uint256) private map;
function Storage() public {
number = 2;
account = this;
array.push(10);
array.push(100);
map[1] = 9;
map[2] = 10;
}
}

Now, let’s open a truffle console to test its storage structure. First, we will compile and create a new contract instance:

truffle(develop)> compile
truffle(develop)> Storage.new().then(i => storage = i)

Then we can ensure that the address 0 holds a number 2 and the address 1 holds the address of the contract:

truffle(develop)> web3.eth.getStorageAt(storage.address, 0) // 0x02
truffle(develop)> web3.eth.getStorageAt(storage.address, 1) // 0x..

We can check that the storage position 2 holds the length of the array as follows:

truffle(develop)> web3.eth.getStorageAt(storage.address, 2) // 0x02

Finally, we can check that the storage position 3 is unused and the mapping values are stored as we described above:

truffle(develop)> web3.eth.getStorageAt(storage.address, 3) 
// 0x00
truffle(develop)> mapIndex = ‘0000000000000000000000000000000000000000000000000000000000000003’
truffle(develop)> firstKey = ‘0000000000000000000000000000000000000000000000000000000000000001’
truffle(develop)> firstPosition = web3.sha3(firstKey + mapIndex, { encoding: ‘hex’ })
truffle(develop)> web3.eth.getStorageAt(storage.address, firstPosition)
// 0x09
truffle(develop)> secondKey = ‘0000000000000000000000000000000000000000000000000000000000000002’
truffle(develop)> secondPosition = web3.sha3(secondKey + mapIndex, { encoding: ‘hex’ })
truffle(develop)> web3.eth.getStorageAt(storage.address, secondPosition)
// 0x0A

Great! We have demonstrated that the Solidity storage strategy works as we understand it! To learn more about how Solidity maps state variables into the storage, read the official documentation.

That’s it. I hope that now you have a much better understanding of what role EVM plays in Ethereum’s infrastucture.

Facebook Comments

More Stuff

Crypto already beats VISA’s transaction capacity Cryptocurrencies have long surpassed the transaction capacity of every traditional payment system in the world. This might be considered a controver...
The Caraga Campaign RootProject and Global Partnership for Sustainable Solutions (GPSS) are exploring an innovative, blockchain-powered funding initiative for urgently ne...
The Final 5 Episodes of Crypto Disrupted Crypto Disrupted Episode 35, 36, 37, 38, and 39: Interviews with David McCarville of Rylely Carlock and Applewhite, Fran Villalba Segerra founder of ...
The Future of ICOs? Security Token Offerings Distributed ledger technologies (DLTs) have given rise to the initial coin offering (ICO). An ICO is a fundraising mechanism used by DLT projects to r...
Spread the love

Posted by News Monkey