This is my first post in the Moonbeam forum, but in a sense, I have been long involved within the community, being the original author and maintainer of Frontier, the underlying EVM pallet that Moonbeam uses.
I was reading up Moonbeam Strategic Direction Update from last September, and especially regarding the direction of Ethereum-compatible Polkadot Plaza / Hub (the revive
project), I find it interesting that we basically reached the same conclusion independently. It’s also in my view that it is a huge strategic mistake to build something that is “almost compatibility” but not “full compatibility”. User adoption is always problematic. It’s a split in Polkadot’s own ecosystem. And Ethereum is a moving target. Any deviation to upstream must be carefully considered to avoid engineering debts. My further concern is regarding security. We all know the difficult lessons we have learned during the early days of Frontier/Moonbeam, and those mistakes are basically being repeated again. The design of reentry protection, as well as the precompile “framework” in revive
is questionable. Those however will only become apparent with further integration with the rest of the system. This is let alone the most important issue – that the deviations may cause a contract that works just fine on Ethereum mainnet to malfunction on Polkadot Hub. If funds are lost we all know what happens – the core devs will be the one that takes all the blames. I have failed, however, to advocate those concerns within Parity.
The main goal why Parity decided to do what they’re doing with revive
is for VM performance. So to put the politics aside, the important question, from Moonbeam’s perspective, is whether the VM performance is accomplishable without those drawbacks of “almost compatibility”.
Solution 1: PolkaVM alongside EVM
If you look into the past public discussions regarding revive
, a lot of the decisions in revive
are all but to avoid putting in an EVM interpreter – EVM is bad that we want to avoid it at all costs, but Ethereum is good that we want compatibility with it. If we remove that constraint, then everything suddenly becomes easy – instead of completely removing EVM, we deploy a PolkaVM executor alongside an EVM interpreter.
In the first solution, we have PolkaVM each as an independent smart contract. In the simplest form, you can think this as each PolkaVM contract is an EVM precompile and they interact by the rules.
Implementation-wise, we leave the Solidity compliation toolchain completely untouched. From Solidity source codes we get EVM bytecodes. Then we utilize the revmc
project from Paradigm which is an EVM bytecode to LLVM recompiler. We extend that to compile that down to PolkaVM bytecode. Then, problem solved!
Performance-wise, there is only one small issue, that EVM bytecode eliminated the static jump information from the Solidity and there’s only dynamic jumps. However, we now have EOF, and both Solidity and revmc
actually already support it. So we just enable that. With this one final trick we should have no performance disadvantage compared with revive
.
Because we have the EVM interpreter, PolkaVM bytecode can still emit EVM bytecode and everything (including factory contracts) works as normal.
Everything that can be “optimized” in revive
can be optimized in this solution, and we retain full compatibility.
Solution 2: PolkaVM inside EVM
The above solution 1 can be immediately developed. There is, however, one thing that may bother some people – in the above design, the smallest possible interaction unit where one can switch from PolkaVM to EVM and vice versa, is a contract. This can already result in considerable performance gain in most situations. But we can still do better.
If a contract is performance-bottlenecked, then it almost always contain two parts – one part that deals with system interactions (balance transfer, log emission, call/create, etc) which is not performance-sensitive, and another part that deals with really performance-sensentive algorithms. If we can have PolkaVM and EVM within the same contract, we can further avoid some context switching costs.
This is made possible by EOF+EIP-7960. In EOF, a single contract can have multiple functions and they can call each other. EIP-7960 further extends that to give a type to each function section. We then have a new type for PolkaVM. Then EVM functions can call PolkaVM functions, and vice versa.
This solution is much more “clean”, in that we basically don’t need to deal with “compatibility” issues at all. There’s no need for any EVM bytecode recompilation but just new PolkaVM bytecode. In practice this will actually work better than revive
or our solution 1 because most contracts are not VM-performance-bottlenecked. The drawback is also obvious, that unlike solution 1, we cannot immediately develop it but must consult with the Ethereum AllCoreDevs, and this will require extensive toolchain support (which takes a long time to develop).
Solution 3: Beyond Polkadot
If we remove the requirement that Moonbeam must be a Polkadot parachain, then more things are actually possible. Parachain environment places a rather hard restriction on what can be done – because all parachain execution environments must be the same, we can’t add new runtime APIs. If, however, we disregard that, we can add a new runtime API that moves the EVM interpreter from runtime to node.
In a recent discussion with the Nervos team and also checking how Paradigm’s revmc
actually worked in production in Reth, I was introduced a new design pattern that may completely change our view on how we optimize on-chain smart contract bytecode. The current view from Polkadot is that there are huge restrictions for an on-chain JIT/AOT recompiler. It must be really fast, and it must avoid JIT bomb. This means we’re left with single-pass recompiler that cannot do much optimization. The bytecode then must be easy to recompile and map as close to native as possible. That’s why we spent so much efforts searching for the “perfect” instruction set. From EVM to Wasm to eBPF and finally to PolkaVM.
What if we’re wrong?
The Polkadot view assumes that the recompilation process always happens immediately before a contract is ran. Therefore that the benchmark of compile+runtime is the most important. This may not have to be the case, because the compilation only needs to happen once! Therefore one can imagine the following design:
- A main thread that handle normal block/transaction processing. It uses already compiled blob when available, or interpreter as fallback.
- An optimization thread in the background that gradually optimizes contracts in the state.
We can imagine that the optimization thread would first try a simple (inefficient) one-pass AOT, and when time permits, try an advanced LLVM AOT with aggressive optimization.
If one compiles EVM bytecode this way, it’ll result in comparable performance to revive
/PolkaVM in runtime. And this is without any specification changes. All EVM contracts function just as before and they take advantage of the new performance improvements!
However, there is one thing left to solve – the optimization thread must have a suitable algorithm that chooses which bytecode to optimize and which JIT bomb to avoid. The related gas metering problems must also be solved. Given that morden CPUs have so many cores and in block validation we usually can only use one or two of them, I’m really confident that this is solvable.
If this design is possible, unfortunately, it’ll also mean that what we’ve spent years on searching the “perfect” instruction set is in vain.
Take a step back: does VM performance matter at all?
Taking a step back, the question is also, whether what Polkadot Hub’s doing makes sense at all. Maybe we’re building the “coolest” tech that doesn’t actually solve the problem at all.
Ethereum core devs commonly hold the belief that the real bottleneck is not VM performance, but IO access. The revmc
recompiler has 10x+ performance improvements for selected contracts. However, last year, Reth team did a benchmark, syncing historical Ethereum blocks using revmc
, and found only 1-10% performance improvements. This may mean that Ethereum core devs are right. And we’re just working on the wrong problem for all those years.
Remember, the core question is blockchain throughput. VM performance is only a sub-question of that and in the case of EVM it may not be the core bottleneck right now. We may need to carefully consider what is actually important. Maybe prioritizing integrating Rob’s NOMT will have greater performance gains for us than PolkaVM.