NB

This article presumes you know what is:

  • (LLVM-liked) SSA form IR.
  • Control-flow graph.

This article will not cover:

  • How to find out the dominant relation between nodes.
  • How to find the dominance frontier for nodes.

I think they are just implementation details, anyone can find out an algorithm to achieve these goals (though might be not very efficient), and there exists some library for out-of-the-box use. If you are implementing your own compiler and do want to implement an efficient algorithm by yourself, you may refer to the paper A Simple, Fast Dominance Algorithm.

Instead, this article will focus on:

  • What is the dominance frontier and why do we need it?
  • Giving examples on how to use the dominance frontier to convert memory access to register operation, especially insert phi nodes.

The problem

When compiling a high-level language function into LLVM IR, a variable may be used as l-value in different branches:

int f(int a, int b) {
    int c;
    if (a < b) {
        c = a;
    } else {
        c = b;
    }
    return c;
}

Which can be translated into some IR1 like this:

fn f(i32 %a, i32 %b) -> i32 {
    %c.addr = alloca i32
    blt %a, %b, if_taken, if_not_taken
if_taken:
    store %a, %c.addr
    j if_end
if_not_taken:
    store %b, %c.addr
if_end:
    %c = load %c.addr
    ret %c
}

This IR is correct, but you may see that to prevent assigning to c in two different basic blocks, which makes it no longer a valid SSA, we "cheat" by "leaking" it to the memory and using load and store to access it. You may do this for every local variable you faced when translating a high-level language to SSA-like IR, I can promise the functions will work as expected.

However, memory access is always much slower than register access. So we do want to try to move those in-memory variables to the registers2 and keep it an SSA meanwhile.

So those crazy computer scientists invented the infamous phi node.

PHINode - The PHINode class is used to represent the magical mystical PHI node, that can not exist in nature, but can be synthesized in a computer scientist's overactive imagination. —— LLVM project, comment of the PHINode class

With phi node, we can make the ir into:

fn f(i32 %a, i32 %b) -> i32 {
    %c.addr = alloca i32
    blt %a, %b, if_taken, if_not_taken
if_taken:
    %c0 = %a
    j if_end
if_not_taken:
    %c1 = %b
if_end:
    %c = phi if_taken.%c0, if_not_taken.%c1
    ret %c
}

%c = phi if_taken.%c0, if_not_taken.%c1 means %c equals %c0 or %c1, depending which basic block the control flow comes from.

So what a compiler needs to do is find out a way to generate this kind of phi node.

A dumb way is, for each basic block that has multiple predecessors, replace each load with a phi node, and search along the predecessor chain for the source.

However this algorithm is very inefficient, luckily, there's an alternative, efficient way to archive this goal.

Where to put phi nodes ——— Dominance Frontier

Let's re-consider where we need to put a phi node.

We can start with a trivial judgment:

For an existing load statement, if the last store statement can be in two (or more) different basic blocks, then we have to insert a phi node for it3.

Then the problem is how to judge whether "the last store statement can be in two (or more) different basic blocks". By using a control-flow graph, we can express this as:

There exist two different paths from the entry block to the basic block the load statement is in, and the last stores on these paths are different.

Let's switch to the store's perspective:

A phi node needs to be placed for this store in some basic block that contains a load to the same address when:

  • There exists a path that comes from the entry, passes the basic block this store is in, and finally reaches the basic block the load is in.
  • There also exists a path that comes from the entry, but not passing the basic block this store is in and reaches the basic block the load is in.

And obviously, the phi node should be placed at the basic node where these two paths meet with each other.

Such a basic block is exactly one of the elements in the dominance frontier of the basic block the store statement is in.

Note that newly generated phi nodes will create a new register, which is necessary to be considered (regarded as a store) when trying to insert other phi nodes. You may also do this by calculating the dominant frontier closure of the basic block of the store basic block.

So the process of finding places to insert phi nodes is simple:

// return a map which stands for:
//     variable -> basic blocks which need to insert a phi node for this variable
fn phi_insert_positions(function: &Function) -> Map<Variable, Set<&BasicBlock>> {
    let mut result = Map::new();
    for variable in function.alloca_variables() {
        let blocks_contain_store = function.blocks
            .filter(|block| block.contains_store_to(variable));
        let mut blocks_to_consider = blocks_contain_store.clone();
        while !blocks_to_consider.empty() {
            let considering_block = block_to_consider.pop();
            for dominator_frontier in considering_block.dorminate_frontiers() {
                result[variable].insert(dominator_frontier);
                if !blocks_contain_store.contains(dominator_frontier) {
                    blocks_to_consider.push(dominator_frontier);
                }
            }
        }
    }
    result
}

Example

Here is a simple example, let's find out where to insert phi nodes for the variable a:

flowchart TD id0(Entry) id1[<span>bb1</span>\n store a, 1] id2[<span>bb2</span>\n %a0 = load a\n store a, %t0] id3[<span>bb3</span>\n%a1 = load a\n store a, %t1] id4[<span>bb4</span>\n...] id5[<span>bb5</span>\n%a2 = load a\n store a, 2] id6[<span>bb6</span>\n%a3 = load a\n store a, 3] id7[<span>bb7</span>\n%a4 = load a] id8[<span>bb8</span>\n%a5 = load a] id9(End) id0 --> id1 id1 --> id2 id1 --> id4 id2 --> id3 id3 --> id2 id3 --> id8 id4 --> id5 id4 --> id6 id5 --> id7 id6 --> id7 id7 --> id8 id8 --> id9

So initially we need to consider bb1, bb2, bb3, bb5 and bb6, which contain store statements to a.

For bb1, its domination frontiers set is empty, so nothing to do for it.

For bb2, its domination frontiers set is {bb2, bb8}, note bb2 itself is in the set, because we can reach bb2 in the path Entrybb1bb2, which doesn't pass bb2, and in the path Entrybb1bb2bb3bb2, which passes bb2.

So we insert phi nodes in bb2 and bb8:

flowchart TD id2[<span>bb2</span><p>%a_bb2 = phi ...</p> %a0 = load a\n store a, %t0] id8[<span>bb8</span><p>%a_bb8 = phi ...</p> %a5 = load a]

Note we neither fill the source value for phi nodes nor remove the load and store statements here, we'll talk about these operations later.

Here bb8 is not in our blocks_contain_store, we need to add it to the consideration list. So our remaining consider list is [bb3, bb5, bb6, bb8].

bb3's domination frontiers set is also {bb2, bb8}, so nothing meaningful is done here.

bb5's domination frontiers set is {bb7}, so we add a phi node to bb7:

flowchart TD id7[<span>bb7</span><p>%a_bb7 = phi ...</p>%a4 = load a]

Also, bb7 needs to be pushed to the consideration list.

Now the consideration list is [bb6, bb8, bb7].

bb6's domination frontiers set is also {bb7}, so nothing to do.

bb8's domination frontier is empty, also nothing to do.

bb7's domination frontier is {bb8}, which is already considered, so nothing to do.

After this process, we have found all places which need to put a phi node:

flowchart TD id0(Entry) id1[<span>bb1</span>\n store a, 1] id2[<span>bb2</span><p>%a_bb2 = phi ...</p> %a0 = load a\n store a, %t0] id3[<span>bb3</span>\n%a1 = load a\n store a, %t1] id4[<span>bb4</span>\n...] id5[<span>bb5</span>\n%a2 = load a\n store a, 2] id6[<span>bb6</span>\n%a3 = load a\n store a, 3] id7[<span>bb7</span><p>%a_bb7 = phi ...</p>%a4 = load a] id8[<span>bb8</span><p>%a_bb8 = phi ...</p> %a5 = load a] id9(End) id0 --> id1 id1 --> id2 id1 --> id4 id2 --> id3 id3 --> id2 id3 --> id8 id4 --> id5 id4 --> id6 id5 --> id7 id6 --> id7 id7 --> id8 id8 --> id9

Decide which value to use4

Now we should decide the "sources" of the phi nodes, and in the meantime, we can also

  • remove the loads to the variable and replace it with the values generated by phi or store.
  • remove the stores.

We can do this by doing DFS over the control flow graph, and keeping track of the current value of each variable.

enum Value {
    Constant(i64),
    Register(Register),
}

// decide a variable's current value with a variable value stack
fn decide_variable_value(
    variable: &Variable,
    current_variable_value: &Stack<Map<Variable, (&BasicBlock, Value)>>,
) -> (&BasicBlock, Value) {
    for frame in current_variable_value.rev() {
        if let Some(value) = frame.get(variable_name) {
            return value;
        }
    }
}

fn decide_values_start_from(
    function: &mut Function,
    block: &mut BasicBlock,
    visited: Set<&BasicBlock>,
    current_variable_value: &mut Stack<Map<Variable, (&BasicBlock, Value)>>,
) {
    // we presume phi nodes have already inserted before calling this function
    // now we should insert phi sources
    for phi in &mut block.phis {
        let (current_value_from, current_value) = decide_variable_value(phi.variable, current_variable_value);
        phi.from.insert((current_value_from, current_value));
        current_variable_value.top()[phi.variable] = (block, phi.to);
    }
    // it's necessary to revisit a visited block when inserting phi nodes
    // so we need to put the visited check after the phi source putting
    if visited.contains(block) {
        return;
    }
    visited.insert(block);
    for statement in block {
        match statement {
            Load { to, from_variable, .. } => {
                let (_, load_value) = decide_variable_value(from_variable, current_variable_value);
                block.remove(statement);
                function.replace(to, load_value);
            },
            Store { value, to_variable, .. } => {
                current_variable_value.last_mut()[to_variable] = value;
                block.remove(statement);
            },
            Branch { success_block, fail_block, .. } => {
                current_variable_value.push(Map::new());
                decide_values_start_from(function, success_block, visited, current_variable_value);
                current_variable_value.pop();

                current_variable_value.push(Map::new());
                decide_values_start_from(function, fail_block, visited, current_variable_value);
                current_variable_value.pop();
            }
            Jump { to_block, .. } => {
                decide_values_start_from(function, to_block, visited, current_variable_value);
            },
            _ => (),
        }
    }
}

Example

We still use the control flow graph mentioned in the phi-inserting process, this time I'll add some statements that "use" the variable a to show the variable value replacement result.

flowchart TD id0(Entry) id1[<span>bb1</span>\n store a, 1] id2[<span>bb2</span><p>%a_bb2 = phi ...</p> %a0 = load a\n %t0 = add %a0, 1 \n store a, %t0] id3[<span>bb3</span>\n%a1 = load a\n %t1 = add %a1, %a1 \n store a, %t1] id4[<span>bb4</span>\n...] id5[<span>bb5</span>\n%a2 = load a\n store a, 2] id6[<span>bb6</span>\n%a3 = load a\n store a, 3] id7[<span>bb7</span><p>%a_bb7 = phi ...</p>%a4 = load a] id8[<span>bb8</span><p>%a_bb8 = phi ...</p>%a5 = load a\n ret %a5] id9(End) id0 --> id1 id1 --> id2 id1 --> id4 id2 --> id3 id3 --> id2 id3 --> id8 id4 --> id5 id4 --> id6 id5 --> id7 id6 --> id7 id7 --> id8 id8 --> id9

We start the DFS process from bb1, currently, the value stack of a is [].

When visiting bb1, we meet up with a store that set a to 1, so after visiting bb1, the value stack of a is [(bb1, 1)]. And the store statement is removed.

flowchart TD id1[<span>bb1</span>\n...]

bb1 ends with a Branch (didn't show on the graph, but you can tell that because the node has two outgoing edges).

First, we visit bb2, and we'll come back to bb4 later.

In bb2 we first met up with a phi node, so we add the current value of a (1, from bb1) to its from.

Then we meet a load, which put the value of a to a0, since a is currently 1, we can remove this load and use 1 to replace all occurrences of a0, this is just what we want to do for the next statement.

And then we have a store, which updates the value stack to [(bb1, 1), (bb2, %t0)].

After all these things, we now have:

flowchart TD id2[<span>bb2</span><p>%a_bb2 = phi bb1.1</p> %t0 = add 1, 1]
5

Then we go to bb3, similar to bb2, we can replace %a1 with %t0, and update the value stack to [(bb1, 1), (bb2, %t0), (bb3, %t1)].

flowchart TD id3[<span>bb3</span>\n %t1 = add %t0, %t0]

And then we are led back to bb2, since bb2 is already visited, we just need to update phi's from with the current value of a now.

flowchart TD id2[<span>bb2</span><p>%a_bb2 = phi bb1.1, bb3.%t1 </p> %t0 = add 1, 1]

And then we come back to bb3 and take another branch, which leads us to bb8, and make it:

flowchart TD id8[<span>bb8</span><p>%a_bb8 = phi bb3.%t1</p>ret %a_bb8]

(Remember the value stack becomes [(bb1, 1), (bb2, %t0), (bb3, %t1), (bb8, %a_bb8)] after visiting the phi statement.)

Then we go back to visit bb4 from bb1, and then bb5, bb7 and bb6.

Finally, we'll get:

flowchart TD id0(Entry) id1[<span>bb1</span>\n ...] id2[<span>bb2</span><p>%a_bb2 = phi bb1.1, bb3.%t1 </p> %t0 = add 1, 1] id3[<span>bb3</span>\n %t1 = add %t0, %t0] id4[<span>bb4</span>\n...] id5[<span>bb5</span>\n...] id6[<span>bb6</span>\n...] id7[<span>bb7</span><p>%a_bb7 = phi bb5.2, bb6.3</p>] id8[<span>bb8</span><p>%a_bb8 = phi bb3.%t1, bb7.%a_bb7</p>ret %a_bb8] id9(End) id0 --> id1 id1 --> id2 id1 --> id4 id2 --> id3 id3 --> id2 id3 --> id8 id4 --> id5 id4 --> id6 id5 --> id7 id6 --> id7 id7 --> id8 id8 --> id9

Now all loads and stores for variable a are eliminated, and we no longer need alloca for variable a. And the function is ready for further optimization.

Summary

This article introduces how to promote variables in memory to register, ie. eliminate loads and stores and insert phis in SSA form IR. And give a detailed example to show how to do it.

1

This is an LLVM-like IR, alloca here means we will save some space on the stack, and we can store values into this space and load from it, I assume every other thing is simple enough to be understood by anyone who knows what SSA is.

2

Though these registers are just logical, i.e. they can still be leaked to the stack when doing codegen, optimizing them into logical registers can engage these variables in further optimization and make the codegen process much easier.

3

Note the phi node doesn't have to be put in the basic block the load statement is in, we can put it in a basic block that after this store but dominate this load.

4

LLVM calls this process "renaming".

5

Here we have a good example of why mem2reg is important: obviously, t0 can be replaced with constant value 2, which is hard to be discovered before mem2reg.