## NB

This article presumes you know what is:

- (LLVM-liked) SSA form IR.
- Control-flow graph.

This article will **not** cover:

**How**to find out the dominant relation between nodes.**How**to find the dominance frontier for nodes.

I think they are just implementation details, anyone can find out an algorithm to achieve these goals (though might be not very efficient), and there exists some library for out-of-the-box use. If you are implementing your own compiler and do want to implement an efficient algorithm by yourself, you may refer to the paper *A Simple, Fast Dominance Algorithm*.

Instead, this article will focus on:

**What**is the dominance frontier and**why**do we need it?- Giving
**examples**on how to use the dominance frontier to convert memory access to register operation, especially insert phi nodes.

## The problem

When compiling a high-level language function into LLVM IR, a variable may be used as l-value in different branches:

```
int f(int a, int b) {
int c;
if (a < b) {
c = a;
} else {
c = b;
}
return c;
}
```

Which can be translated into some IR^{1} like this:

```
fn f(i32 %a, i32 %b) -> i32 {
%c.addr = alloca i32
blt %a, %b, if_taken, if_not_taken
if_taken:
store %a, %c.addr
j if_end
if_not_taken:
store %b, %c.addr
if_end:
%c = load %c.addr
ret %c
}
```

This IR is correct, but you may see that to prevent assigning to `c`

in two different basic blocks, which makes it no longer a valid SSA, we "cheat" by "leaking" it to the memory and using `load`

and `store`

to access it. You may do this for every local variable you faced when translating a high-level language to SSA-like IR, I can promise the functions will work as expected.

However, memory access is always much slower than register access. So we do want to try to move those in-memory variables to the registers^{2} and keep it an SSA meanwhile.

So those crazy computer scientists invented the infamous `phi`

node.

PHINode - The PHINode class is used to represent the magical mystical PHI node, that can not exist in nature, but can be synthesized in a computer scientist's overactive imagination. —— LLVM project, comment of the PHINode class

With `phi`

node, we can make the ir into:

```
fn f(i32 %a, i32 %b) -> i32 {
%c.addr = alloca i32
blt %a, %b, if_taken, if_not_taken
if_taken:
%c0 = %a
j if_end
if_not_taken:
%c1 = %b
if_end:
%c = phi if_taken.%c0, if_not_taken.%c1
ret %c
}
```

`%c = phi if_taken.%c0, if_not_taken.%c1`

means `%c`

equals `%c0`

or `%c1`

, depending which basic block the control flow comes from.

So what a compiler needs to do is find out a way to generate this kind of phi node.

A dumb way is, for each basic block that has multiple predecessors, replace each `load`

with a phi node, and search along the predecessor chain for the source.

However this algorithm is very inefficient, luckily, there's an alternative, efficient way to archive this goal.

## Where to put `phi`

nodes ——— Dominance Frontier

Let's re-consider where we need to put a phi node.

We can start with a trivial judgment:

For an existing `load`

statement, if the last `store`

statement can be in two (or more) different basic blocks, then we have to insert a phi node for it^{3}.

Then the problem is how to judge whether "the last `store`

statement can be in two (or more) different basic blocks".
By using a control-flow graph, we can express this as:

There exist two different paths from the entry block to the basic block the `load`

statement is in, and the last `store`

s on these paths are different.

Let's switch to the `store`

's perspective:

A phi node needs to be placed for this `store`

in some basic block that contains a `load`

to the same address when:

- There exists a path that comes from the entry, passes the basic block this
`store`

is in, and finally reaches the basic block the`load`

is in. - There also exists a path that comes from the entry, but
**not**passing the basic block this`store`

is in and reaches the basic block the`load`

is in.

And obviously, the phi node should be placed at the basic node where these two paths meet with each other.

Such a basic block is exactly one of the elements in the **dominance frontier** of the basic block the `store`

statement is in.

Note that newly generated phi nodes will create a new register, which is necessary to be considered (regarded as a `store`

) when trying to insert other phi nodes. You may also do this by calculating the dominant frontier closure of the basic block of the `store`

basic block.

So the process of finding places to insert `phi`

nodes is simple:

```
// return a map which stands for:
// variable -> basic blocks which need to insert a phi node for this variable
fn phi_insert_positions(function: &Function) -> Map<Variable, Set<&BasicBlock>> {
let mut result = Map::new();
for variable in function.alloca_variables() {
let blocks_contain_store = function.blocks
.filter(|block| block.contains_store_to(variable));
let mut blocks_to_consider = blocks_contain_store.clone();
while !blocks_to_consider.empty() {
let considering_block = block_to_consider.pop();
for dominator_frontier in considering_block.dorminate_frontiers() {
result[variable].insert(dominator_frontier);
if !blocks_contain_store.contains(dominator_frontier) {
blocks_to_consider.push(dominator_frontier);
}
}
}
}
result
}
```

### Example

Here is a simple example, let's find out where to insert phi nodes for the variable `a`

:

So initially we need to consider `bb1`

, `bb2`

, `bb3`

, `bb5`

and `bb6`

, which contain `store`

statements to `a`

.

For `bb1`

, its domination frontiers set is empty, so nothing to do for it.

For `bb2`

, its domination frontiers set is {`bb2`

, `bb8`

}, note `bb2`

itself is in the set, because we can reach `bb2`

in the path `Entry`

→ `bb1`

→ `bb2`

, which doesn't pass `bb2`

, and in the path `Entry`

→ `bb1`

→ `bb2`

→ `bb3`

→ `bb2`

, which passes `bb2`

.

So we insert `phi`

nodes in `bb2`

and `bb8`

:

Note we neither fill the source value for `phi`

nodes nor remove the `load`

and `store`

statements here, we'll talk about these operations later.

Here `bb8`

is not in our `blocks_contain_store`

, we need to add it to the consideration list. So our remaining consider list is `[bb3, bb5, bb6, bb8]`

.

`bb3`

's domination frontiers set is also {`bb2`

, `bb8`

}, so nothing meaningful is done here.

`bb5`

's domination frontiers set is {`bb7`

}, so we add a `phi`

node to `bb7`

:

Also, `bb7`

needs to be pushed to the consideration list.

Now the consideration list is `[bb6, bb8, bb7]`

.

`bb6`

's domination frontiers set is also {`bb7`

}, so nothing to do.

`bb8`

's domination frontier is empty, also nothing to do.

`bb7`

's domination frontier is {`bb8`

}, which is already considered, so nothing to do.

After this process, we have found all places which need to put a `phi`

node:

### Decide which value to use^{4}

Now we should decide the "sources" of the `phi`

nodes, and in the meantime, we can also

- remove the
`load`

s to the variable and replace it with the values generated by`phi`

or`store`

. - remove the
`store`

s.

We can do this by doing DFS over the control flow graph, and keeping track of the current value of each variable.

```
enum Value {
Constant(i64),
Register(Register),
}
// decide a variable's current value with a variable value stack
fn decide_variable_value(
variable: &Variable,
current_variable_value: &Stack<Map<Variable, (&BasicBlock, Value)>>,
) -> (&BasicBlock, Value) {
for frame in current_variable_value.rev() {
if let Some(value) = frame.get(variable_name) {
return value;
}
}
}
fn decide_values_start_from(
function: &mut Function,
block: &mut BasicBlock,
visited: Set<&BasicBlock>,
current_variable_value: &mut Stack<Map<Variable, (&BasicBlock, Value)>>,
) {
// we presume phi nodes have already inserted before calling this function
// now we should insert phi sources
for phi in &mut block.phis {
let (current_value_from, current_value) = decide_variable_value(phi.variable, current_variable_value);
phi.from.insert((current_value_from, current_value));
current_variable_value.top()[phi.variable] = (block, phi.to);
}
// it's necessary to revisit a visited block when inserting phi nodes
// so we need to put the visited check after the phi source putting
if visited.contains(block) {
return;
}
visited.insert(block);
for statement in block {
match statement {
Load { to, from_variable, .. } => {
let (_, load_value) = decide_variable_value(from_variable, current_variable_value);
block.remove(statement);
function.replace(to, load_value);
},
Store { value, to_variable, .. } => {
current_variable_value.last_mut()[to_variable] = value;
block.remove(statement);
},
Branch { success_block, fail_block, .. } => {
current_variable_value.push(Map::new());
decide_values_start_from(function, success_block, visited, current_variable_value);
current_variable_value.pop();
current_variable_value.push(Map::new());
decide_values_start_from(function, fail_block, visited, current_variable_value);
current_variable_value.pop();
}
Jump { to_block, .. } => {
decide_values_start_from(function, to_block, visited, current_variable_value);
},
_ => (),
}
}
}
```

### Example

We still use the control flow graph mentioned in the `phi`

-inserting process, this time I'll add some statements that "use" the variable `a`

to show the variable value replacement result.

We start the DFS process from `bb1`

, currently, the value stack of a is `[]`

.

When visiting `bb1`

, we meet up with a `store`

that set `a`

to `1`

, so after visiting `bb1`

, the value stack of `a`

is `[(bb1, 1)]`

. And the `store`

statement is removed.

`bb1`

ends with a `Branch`

(didn't show on the graph, but you can tell that because the node has two outgoing edges).

First, we visit `bb2`

, and we'll come back to `bb4`

later.

In `bb2`

we first met up with a `phi`

node, so we add the current value of `a`

(1, from `bb1`

) to its `from`

.

Then we meet a `load`

, which put the value of `a`

to `a0`

, since `a`

is currently `1`

, we can remove this `load`

and use `1`

to replace all occurrences of `a0`

, this is just what we want to do for the next statement.

And then we have a `store`

, which updates the value stack to `[(bb1, 1), (bb2, %t0)]`

.

After all these things, we now have:

^{5}

Then we go to `bb3`

, similar to `bb2`

, we can replace `%a1`

with `%t0`

, and update the value stack to `[(bb1, 1), (bb2, %t0), (bb3, %t1)]`

.

And then we are led back to `bb2`

, since `bb2`

is already visited, we just need to update `phi`

's from with the current value of `a`

now.

And then we come back to `bb3`

and take another branch, which leads us to `bb8`

, and make it:

(Remember the value stack becomes `[(bb1, 1), (bb2, %t0), (bb3, %t1), (bb8, %a_bb8)]`

after visiting the `phi`

statement.)

Then we go back to visit `bb4`

from `bb1`

, and then `bb5`

, `bb7`

and `bb6`

.

Finally, we'll get:

Now all `load`

s and `store`

s for variable `a`

are eliminated, and we no longer need `alloca`

for variable `a`

. And the function is ready for further optimization.

## Summary

This article introduces how to promote variables in memory to register, ie. eliminate `load`

s and `store`

s and insert `phi`

s in SSA form IR. And give a detailed example to show how to do it.

^{1}

This is an LLVM-like IR, `alloca`

here means we will save some space on the stack, and we can `store`

values into this space and `load`

from it, I assume every other thing is simple enough to be understood by anyone who knows what SSA is.

^{2}

Though these registers are just logical, i.e. they can still be leaked to the stack when doing codegen, optimizing them into logical registers can engage these variables in further optimization and make the codegen process much easier.

^{3}

Note the phi node doesn't have to be put in the basic block the `load`

statement is in, we can put it in a basic block that after this store but dominate this `load`

.

^{4}

LLVM calls this process "renaming".

^{5}

Here we have a good example of why mem2reg is important: obviously, `t0`

can be replaced with constant value `2`

, which is hard to be discovered before mem2reg.