NB

This article presumes you know what is:

  • (LLVM-liked) SSA form IR.
  • Control-flow graph.
  • WASM and its text form WAT.

The basic idea of WASM's control flow instructions1

if-then-else

Just like high-level languages, wasm supports if-then-else statements:

(local $var i32)
(if (i32.const 0) (then
    (local.set $var (i32.const 1))) 
(else
    (local.set $var (i32.const 0)))
)

the else part can be omitted.

Note that although we fold the expressions in this way, the wasm instructions are actually organized in a different manner2:

(local $var i32)
i32.const 0
if
i32.const 1
local.set $var
else
i32.const 0
local.set $var
end

And that's also how we need to generate our WebAssembly code.

(Note: The end instruction is omitted when we fold the expressions)

loop + br

loop means that in the following code, until the corresponding end, we can jump to the top of the block using the br instruction.

(loop $my_loop
    ;; do something
    br $my_loop ;; continue the loop
)

If no br is executed, the loop will terminate when the end of its content is reached.

block + br

block means that in the following code, until the corresponding end, we can jump out of the block using the br instruction.

(block $my_block
  ;; do something
  (br_if $my_block (
        i32.lt_s (global.get $i) (i32.const 10)
  ))
  ;; do other things, will not be executed if br_if is taken
)

block and loop themselves don't have any special usage, they just change the meaning of br inside them.

br/br_if

br and br_if are used to jump to a "label", which can be either a loop or a block.

br_if is used to jump to a label only if the condition (the value on top of the stack) is true.

Note that br and br_if can only break/continue a loop/block they are in; they cannot jump to an arbitrary place. This is because, at the binary code level, the argument of br and br_if indicates how many levels of loop and block we need to jump out of.

There are also some other instructions like br_table and return, but they are not important for this article, so they will not be mentioned here.

The key problem: cannot branch to arbitrary place

LLVM IR has a br instruction, which can jump to an arbitrary place, making this CFG easily representable in LLVM:

flowchart TD id0(Entry) id0 --> A A --> B B --> C F --> D B --> D D --> E C --> E C --> F D --> G E --> B E --> G F --> H G --> H

But it is not representable in WASM using only the control flow instructions we have.3

Fortunately, there is a very simple method to represent any CFGs by adding a "next block" variable and wrapping the entire program in a loop:

The naive algorithm4

The basic idea of this method is to disregard all the fancy control flow instructions provided by wasm and manage everything ourselves.

We can place everything in a loop, use a variable to record the next block to execute, and employ br to jump to the next block:

(local $next_block i32)
(local.set $next_block (i32.const 0))
(loop $my_loop
    (local.get $next_block)
    (if (i32.eq (local.get $next_block) (i32.const 0)) (then
        ;; A content here
        ;; $next_block = B
        (local.set $next_block (i32.const 1))
        br $my_loop
    ))
    (if (i32.eq (local.get $next_block) (i32.const 1)) (then
        ;; B content here
        (if (local.get $cond_1) (then
            ;; $next_block = C
            (local.set $next_block (i32.const 2)))
        (else
            ;; $next_block = D
            (local.set $next_block (i32.const 3)))
        )
        br $my_loop
    ))
    (if (i32.eq (local.get $next_block) (i32.const 2)) (then
        ;; C content here
        ;; omitted
    ))
    (if (i32.eq (local.get $next_block) (i32.const 3)) (then
        ;; D content here
        ;; omitted
    ))
    ;; rest omitted
)

It is easy to see the problem with this algorithm: the generated code contains a lot of (unnecessary) access to the $next_block variable and conditional judgments. It would be better to have an alternative algorithm that can make use of the control flow instructions provided by WebAssembly.

In the Emscripten paper, such an algorithm is introduced.

The Relooper Algorithm

The basic idea of the relooper algorithm is to identify certain patterns in the control flow graph and generate code to handle these patterns.

Finding a (Simple) Loop in CFG

A simple loop, which can be represented with a loop and several brs, can be identified by finding cycles5 in the CFG that have:

  • only one way to exit the loop
  • only one way to enter the loop

For example:

flowchart LR subgraph From Entry(...) end subgraph To F(...) end subgraph Content A B C D E end Entry --> A A --> B B --> C C --> D D --> E(...) E --> A A --> F

Can be translated into:

;; wasm corresponding to `From`
(loop $my_loop
    ;; wasm corresponding to `A`
    ;; we presume basic block `A` put branch condition onto top of the stack
    (if ;; load from stack top
        (then 
            ;; wasm corresponding to `B`
            ;; wasm corresponding to `C`
            ;; wasm corresponding to `D`
            ;; ...
            br $my_loop
        )
    )
)
;; wasm corresponding to `To`

It's also possible that the entry of the loop is different from the exit of the loop, for example:

flowchart LR subgraph From Entry(...) end subgraph To F(...) end subgraph Content A B C D E end Entry --> A A --> B B --> C C --> D D --> E(...) E --> A C --> F

can be translated to:

;; wasm corresponding to `From`
(loop $my_loop
    ;; wasm corresponding to `A`
    ;; wasm corresponding to `B`
    ;; wasm corresponding to `C`
    ;; we presume basic block `C` put branch condition onto top of the stack
    (if ;; load from stack top
        (then 
            ;; wasm corresponding to `D`
            ;; ...
            br $my_loop
        )
    )
)
;; wasm corresponding to `To`

Find a conditional branch in CFG

A conditional branch, which can be represented with an if-then-else, can be identified by finding a node that:

  • has two different groups of successors, in which
    • the node itself is the one and only entry for each group of successors6
    • the successor groups can "meet" after leaving their exit node

Note that one of the groups can be empty.

For example:

flowchart TD subgraph From FromNode(...) end subgraph To ToNode(...) end subgraph Content1 B --> C end subgraph Content2 D --> E end FromNode --> A A --> B A --> D C --> ToNode E --> ToNode

can be translated to:

;; wasm corresponding to `From`
;; wasm corresponding to `A`
;; we presume basic block `A` put branch condition onto top of the stack
(if ;; load from stack top
    (then 
        ;; wasm corresponding to `B`
        ;; wasm corresponding to `C`
    )
    (else 
        ;; wasm corresponding to `D`
        ;; wasm corresponding to `E`
    )
)
;; wasm corresponding to `To`

Combine Them Together

So, what we need to do is recognize these two patterns in a CFG and emit code for them.

We can recognize the nodes one by one:

  • If we can reach only one other node from this node, and we can never come back, we can just emit code for this node and continue to the next node.
  • If the node is the entry node of a (simple) loop, we can emit the code according to the algorithm for loops mentioned above and continue to the next node.
  • If the node is the entry node of a conditional branch, we can emit the code according to the algorithm for conditional branches mentioned above and continue to the next node.
  • Otherwise, we can fall back to the naive algorithm for all blocks that can be reached from this node.

And that's the basic idea of the Relooper algorithm.

Disadvantages

The issue is that wasm's control flow instructions are more powerful than the code generated by the relooper algorithm. For example:

flowchart TD A --> B A --> C A --> D A --> E B --> C C --> D D --> E

For this CFG, the relooper algorithm can do nothing but fall back to the naive algorithm. However, it can be translated into wasm in the following way:

(block $before_E
    (block $before_D
        ;; A content here
        (block $before_C
            (if ;; should branch to B
                (then
                    ;; B content here
                    br $before_C))
            (if ;; should branch to C
                (then br $before_C))
            (if ;; should branch to D
                (then br $before_D))
            (if ;; should branch to E
                (then br $before_E))
        )
        ;; C content here
    )
    ;; D content here
)
;; E content here

The stackifier algorithm

Part 1: How to deal with cycles with multiple entries?

During the observations above, we may find that we cannot handle cycles with multiple entries. For example:

flowchart TD A --> B B --> C C --> D D --> B A --> E E --> D D --> F

In this CFG, we can find a cycle B->C->D->B with multiple entries (A->B and E->D), which is difficult to handle.

In the relooper algorithm, we simply fall back to the naive algorithm for all blocks that can be reached from this node.

However, we may not need to apply the naive algorithm to all blocks. Instead, we can use the idea of using labels to "branch" only to the entry nodes of the cycle.

We can add a dispatcher node and redirect all the incoming edges of B and D to the dispatcher node.

flowchart TD A --> Dispatcher B --> C C --> D D --> Dispatcher A --> E E --> Dispatcher Dispatcher --> B Dispatcher --> D D --> F

The Dispatcher will create a label variable, and:

  • A will set it to B
  • E will set it to D
  • C will set it to B (C will jump to D by itself, so we don't set it to D)
  • D will set it according to the condition judgment.

After this transformation, we can handle the CFG as if it's a cycle with only one entry node.

A problem here is that loops may be nested within each other, e.g.

graph TD 0 --> 1 0 --> 2 1 --> 2 2 --> 3 3 --> 1 3 --> 4 4 --> 1 2 --> 6 4 --> 6 6 --> 5 5 --> 4

After applying the process mentioned earlier, we will get:

graph TD 0 --> Dispatcher Dispatcher --> 1 Dispatcher --> 2 2 --> 3 3 --> 4 4 --> 6 6 --> 5 5 --> 4 2 --> 6 1 --> Dispatcher 3 --> Dispatcher 4 --> Dispatcher

After this process, 123456 & Dispatcher still form a loop. To continue the process, we need to omit the back edges connecting to the created dispatcher when looking for new loops.

graph TD 0 --> Dispatcher Dispatcher --> 1 Dispatcher --> 2 2 --> 3 3 --> 4 4 --> 6 6 --> 5 5 --> 4 2 --> 6 1 -.-> Dispatcher 3 -.-> Dispatcher 4 -.-> Dispatcher

Now we can continue with the next loop 456, and we will finally get:

graph TD 0 --> 7 1 -.-> 7 2 --> 3 2 --> 8 3 -.-> 7 3 --> 8 4 -.-> 7 4 -.-> 8 5 -.-> 8 6 --> 5 7 --> 1 7 --> 2 8 --> 4 8 --> 6

Part 2: How to handle CFGs without cycles with multiple entries?

The remaining part can also be handled by the relooper algorithm, but you may find that the relooper algorithm is a greedy algorithm that may not provide the best result, because it cannot assume there are no cycles with multiple entries in the CFG.

After removing cycles with multiple entries, the stackifier algorithm can now provide a better solution to this problem.

Topological Ordering

Topological sort is a useful tool when dealing with code scheduling. Basically, it ensures that if a node A is reached before B, then it is ordered prior to B.

We cannot perform a traditional topological sort algorithm on a CFG, because it's not a DAG.

However, we can remove all the back edges from the CFG, and then perform a topological sort on the resulting DAG. The stackifier algorithm adds another requirement: the entry node of the loop must be ordered before ALL the nodes in the loop7, and we must visit ALL nodes inside a loop before visit any other nodes, so that when generating wasm code, we can "cut" the content of this loop out of the whole block sequence.

Then the process is easy, just:

  • Wrap each loop in loop.
  • For each forward edge where the source and destination blocks are not consecutive, we can put a block at the beginning of the function8, use break to jump to the destination block, and put end before the target block.

Example

flowchart TD Entry --> A A --> B B --> C C --> D D --> E E --> G E --> F F --> C G --> B G --> O A --> H H --> I I --> K K --> L H --> J J --> L L --> M M --> N L --> N N --> O O --> Exit

After the topological sort, the block order is ABCDEFGHIKJLMNO.

First, simply wrap the loops in loop instructions.

flowchart TD Entry --> A A --> B subgraph Loop2 B --> C subgraph Loop1 C --> D D --> E E --> F end E --> G F --> C G --> B end G --> O A --> H H --> I I --> K K --> L H --> J J --> L L --> M M --> N L --> N N --> O O --> Exit

And the currently generated code looks like this:

;; A
loop $loop2
    ;; B 
    loop $loop1
        ;; C
        ;; D
        ;; E
        ;; F 
    end
    ;; G
end
;; HIKJLMNO

A -> H, H -> J, and L -> N are forward edges where the source and destination blocks are not consecutive. We create blocks for them one by one.

block $AH
    ;; A
    br_if $AH
    loop $loop2
        ;; B 
        loop $loop1
            ;; C
            ;; D
            ;; E
            ;; F 
        end
        ;; G
    end
end
;; H
;; IKJLMNO
block $HJ
    block $AH
        ;; A
        br_if $AH
        loop $loop2
            ;; B 
            loop $loop1
                ;; C
                ;; D
                ;; E
                ;; F 
            end
            ;; G
        end
    end
    ;; H
    br_if $HJ
    ;; I
    ;; K
end
;; J
;; LMNO
block $LN
    block $HJ
        block $AH
            ;; A
            br_if $AH
            loop $loop2
                ;; B 
                loop $loop1
                    ;; C
                    ;; D
                    ;; E
                    ;; F 
                end
                ;; G
            end
        end
        ;; H
        br_if $HJ
        ;; I
        ;; K
    end
    ;; J
    ;; L
    br_if $LN
    ;; M
end
;; N
;; O

And now we can translate the CFG into wasm by simply filling in the content of basic blocks.

Generating more human-readable WAT

Although we have generated the wasm code, the control flow structure we are using (br_if) to represent conditional branches is not easily understandable for humans; we prefer if-else.

We can observe that if a block has only one forward predecessor, and the predecessor is a branch block, it can be nested into the if or else branch of this predecessor.

So when performing a topological sort of the blocks, after we add a block, we first visit the successors for which the block is their only forward predecessor. This makes it possible to be nested in branches of this block.

1

Personally, I think how wasm handles control flow is a bit weird. block is a real instruction instead of some meta information on the code, br can mean either break or continue depending on whether it's in a block or loop, but it cannot be used to "branch" to an arbitrary place, and wasm provides both br_if and if, which seems non-orthogonal. Although many developers are asking for arbitrary labels and gotos to be added into wasm, I think it's hopeless because wasm has already gone too far to adopt such a significant design change.

2

Yes, if, else, and even end are "real" instructions, which have corresponding binary forms. It's quite strange for an "asm" to have these structures, but that's how wasm does things, and they are useful for a stack-based VM.

3

For those who are very familiar with the properties of CFG, you may have known that this is because the CFG is not reducible.

6

Or, the node "dominates" every node in the group of successors.

5

Or, in a more professional statement, strongly connected component, because there can be sub-cycles inside the outer cycle.

4

This algorithm was first introduced by a paper in 1973.

7

This means you should visit everything inside the loop first before going out of the loop when applying a DFS-based topological sort algorithm.

8

wasm doesn't seem to care much about how deeply blocks are nested.