This article presumes you know what is:
- (LLVM-liked) SSA form IR.
- Control-flow graph.
- WASM and its text form WAT.
The basic idea of WASM's control flow instructions1
Just like high-level languages, wasm supports if-then-else statements:
(local $var i32) (if (i32.const 0) (then (local.set $var (i32.const 1))) (else (local.set $var (i32.const 0))) )
else part can be omitted.
Note that although we fold the expressions in this way, the wasm instructions are actually organized in a different manner2:
(local $var i32) i32.const 0 if i32.const 1 local.set $var else i32.const 0 local.set $var end
And that's also how we need to generate our WebAssembly code.
end instruction is omitted when we fold the expressions)
loop means that in the following code, until the corresponding
end, we can jump to the top of the block using the
(loop $my_loop ;; do something br $my_loop ;; continue the loop )
br is executed, the loop will terminate when the end of its content is reached.
block means that in the following code, until the corresponding
end, we can jump out of the block using the
(block $my_block ;; do something (br_if $my_block ( i32.lt_s (global.get $i) (i32.const 10) )) ;; do other things, will not be executed if br_if is taken )
loop themselves don't have any special usage, they just change the meaning of
br inside them.
br_if are used to jump to a "label", which can be either a
loop or a
br_if is used to jump to a label only if the condition (the value on top of the stack) is true.
br_if can only break/continue a loop/block they are in; they cannot jump to an arbitrary place. This is because, at the binary code level, the argument of
br_if indicates how many levels of
block we need to jump out of.
There are also some other instructions like
return, but they are not important for this article, so they will not be mentioned here.
The key problem: cannot branch to arbitrary place
LLVM IR has a
br instruction, which can jump to an arbitrary place, making this CFG easily representable in LLVM:
But it is not representable in WASM using only the control flow instructions we have.3
Fortunately, there is a very simple method to represent any CFGs by adding a "next block" variable and wrapping the entire program in a
The naive algorithm4
The basic idea of this method is to disregard all the fancy control flow instructions provided by wasm and manage everything ourselves.
We can place everything in a loop, use a variable to record the next block to execute, and employ
br to jump to the next block:
(local $next_block i32) (local.set $next_block (i32.const 0)) (loop $my_loop (local.get $next_block) (if (i32.eq (local.get $next_block) (i32.const 0)) (then ;; A content here ;; $next_block = B (local.set $next_block (i32.const 1)) br $my_loop )) (if (i32.eq (local.get $next_block) (i32.const 1)) (then ;; B content here (if (local.get $cond_1) (then ;; $next_block = C (local.set $next_block (i32.const 2))) (else ;; $next_block = D (local.set $next_block (i32.const 3))) ) br $my_loop )) (if (i32.eq (local.get $next_block) (i32.const 2)) (then ;; C content here ;; omitted )) (if (i32.eq (local.get $next_block) (i32.const 3)) (then ;; D content here ;; omitted )) ;; rest omitted )
It is easy to see the problem with this algorithm: the generated code contains a lot of (unnecessary) access to the
$next_block variable and conditional judgments. It would be better to have an alternative algorithm that can make use of the control flow instructions provided by WebAssembly.
In the Emscripten paper, such an algorithm is introduced.
The Relooper Algorithm
The basic idea of the relooper algorithm is to identify certain patterns in the control flow graph and generate code to handle these patterns.
Finding a (Simple) Loop in CFG
A simple loop, which can be represented with a
loop and several
brs, can be identified by finding cycles5 in the CFG that have:
- only one way to exit the loop
- only one way to enter the loop
Can be translated into:
;; wasm corresponding to `From` (loop $my_loop ;; wasm corresponding to `A` ;; we presume basic block `A` put branch condition onto top of the stack (if ;; load from stack top (then ;; wasm corresponding to `B` ;; wasm corresponding to `C` ;; wasm corresponding to `D` ;; ... br $my_loop ) ) ) ;; wasm corresponding to `To`
It's also possible that the entry of the loop is different from the exit of the loop, for example:
can be translated to:
;; wasm corresponding to `From` (loop $my_loop ;; wasm corresponding to `A` ;; wasm corresponding to `B` ;; wasm corresponding to `C` ;; we presume basic block `C` put branch condition onto top of the stack (if ;; load from stack top (then ;; wasm corresponding to `D` ;; ... br $my_loop ) ) ) ;; wasm corresponding to `To`
Find a conditional branch in CFG
A conditional branch, which can be represented with an
if-then-else, can be identified by finding a node that:
- has two different groups of successors, in which
- the node itself is the one and only entry for each group of successors6
- the successor groups can "meet" after leaving their exit node
Note that one of the groups can be empty.
can be translated to:
;; wasm corresponding to `From` ;; wasm corresponding to `A` ;; we presume basic block `A` put branch condition onto top of the stack (if ;; load from stack top (then ;; wasm corresponding to `B` ;; wasm corresponding to `C` ) (else ;; wasm corresponding to `D` ;; wasm corresponding to `E` ) ) ;; wasm corresponding to `To`
Combine Them Together
So, what we need to do is recognize these two patterns in a CFG and emit code for them.
We can recognize the nodes one by one:
- If we can reach only one other node from this node, and we can never come back, we can just emit code for this node and continue to the next node.
- If the node is the entry node of a (simple) loop, we can emit the code according to the algorithm for loops mentioned above and continue to the next node.
- If the node is the entry node of a conditional branch, we can emit the code according to the algorithm for conditional branches mentioned above and continue to the next node.
- Otherwise, we can fall back to the naive algorithm for all blocks that can be reached from this node.
And that's the basic idea of the Relooper algorithm.
The issue is that wasm's control flow instructions are more powerful than the code generated by the relooper algorithm. For example:
For this CFG, the relooper algorithm can do nothing but fall back to the naive algorithm. However, it can be translated into wasm in the following way:
(block $before_E (block $before_D ;; A content here (block $before_C (if ;; should branch to B (then ;; B content here br $before_C)) (if ;; should branch to C (then br $before_C)) (if ;; should branch to D (then br $before_D)) (if ;; should branch to E (then br $before_E)) ) ;; C content here ) ;; D content here ) ;; E content here
The stackifier algorithm
Part 1: How to deal with cycles with multiple entries?
During the observations above, we may find that we cannot handle cycles with multiple entries. For example:
In this CFG, we can find a cycle
B->C->D->B with multiple entries (
E->D), which is difficult to handle.
In the relooper algorithm, we simply fall back to the naive algorithm for all blocks that can be reached from this node.
However, we may not need to apply the naive algorithm to all blocks. Instead, we can use the idea of using labels to "branch" only to the entry nodes of the cycle.
We can add a dispatcher node and redirect all the incoming edges of
D to the dispatcher node.
The Dispatcher will create a label variable, and:
Awill set it to
Ewill set it to
Cwill set it to
Cwill jump to
Dby itself, so we don't set it to
Dwill set it according to the condition judgment.
After this transformation, we can handle the CFG as if it's a cycle with only one entry node.
A problem here is that loops may be nested within each other, e.g.
After applying the process mentioned earlier, we will get:
After this process,
123456 & Dispatcher still form a loop. To continue the process, we need to omit the back edges connecting to the created dispatcher when looking for new loops.
Now we can continue with the next loop
456, and we will finally get:
Part 2: How to handle CFGs without cycles with multiple entries?
The remaining part can also be handled by the relooper algorithm, but you may find that the relooper algorithm is a greedy algorithm that may not provide the best result, because it cannot assume there are no cycles with multiple entries in the CFG.
After removing cycles with multiple entries, the stackifier algorithm can now provide a better solution to this problem.
Topological sort is a useful tool when dealing with code scheduling. Basically, it ensures that if a node
A is reached before
B, then it is ordered prior to
We cannot perform a traditional topological sort algorithm on a CFG, because it's not a DAG.
However, we can remove all the back edges from the CFG, and then perform a topological sort on the resulting DAG. The stackifier algorithm adds another requirement: the entry node of the loop must be ordered before ALL the nodes in the loop7, and we must visit ALL nodes inside a loop before visit any other nodes, so that when generating wasm code, we can "cut" the content of this loop out of the whole block sequence.
Then the process is easy, just:
- Wrap each loop in
- For each forward edge where the source and destination blocks are not consecutive, we can put a
blockat the beginning of the function8, use
breakto jump to the destination block, and put
endbefore the target block.
After the topological sort, the block order is
First, simply wrap the loops in
And the currently generated code looks like this:
;; A loop $loop2 ;; B loop $loop1 ;; C ;; D ;; E ;; F end ;; G end ;; HIKJLMNO
A -> H,
H -> J, and
L -> N are forward edges where the source and destination blocks are not consecutive. We create blocks for them one by one.
block $AH ;; A br_if $AH loop $loop2 ;; B loop $loop1 ;; C ;; D ;; E ;; F end ;; G end end ;; H ;; IKJLMNO
block $HJ block $AH ;; A br_if $AH loop $loop2 ;; B loop $loop1 ;; C ;; D ;; E ;; F end ;; G end end ;; H br_if $HJ ;; I ;; K end ;; J ;; LMNO
block $LN block $HJ block $AH ;; A br_if $AH loop $loop2 ;; B loop $loop1 ;; C ;; D ;; E ;; F end ;; G end end ;; H br_if $HJ ;; I ;; K end ;; J ;; L br_if $LN ;; M end ;; N ;; O
And now we can translate the CFG into wasm by simply filling in the content of basic blocks.
Generating more human-readable WAT
Although we have generated the wasm code, the control flow structure we are using (
br_if) to represent conditional branches is not easily understandable for humans; we prefer if-else.
We can observe that if a block has only one forward predecessor, and the predecessor is a branch block, it can be nested into the
else branch of this predecessor.
So when performing a topological sort of the blocks, after we add a block, we first visit the successors for which the block is their only forward predecessor. This makes it possible to be nested in branches of this block.
Personally, I think how wasm handles control flow is a bit weird.
block is a real instruction instead of some meta information on the code,
br can mean either
continue depending on whether it's in a
loop, but it cannot be used to "branch" to an arbitrary place, and wasm provides both
if, which seems non-orthogonal.
Although many developers are asking for arbitrary labels and gotos to be added into wasm, I think it's hopeless because wasm has already gone too far to adopt such a significant design change.
else, and even
end are "real" instructions, which have corresponding binary forms. It's quite strange for an "asm" to have these structures, but that's how wasm does things, and they are useful for a stack-based VM.
For those who are very familiar with the properties of CFG, you may have known that this is because the CFG is not reducible.
Or, the node "dominates" every node in the group of successors.
Or, in a more professional statement, strongly connected component, because there can be sub-cycles inside the outer cycle.
This algorithm was first introduced by a paper in 1973.
This means you should visit everything inside the loop first before going out of the loop when applying a DFS-based topological sort algorithm.
wasm doesn't seem to care much about how deeply blocks are nested.