NB
This article presumes you know what is:
- (LLVM-liked) SSA form IR.
- Control-flow graph.
This article will not cover:
- How to find out the dominant relation between nodes.
- How to find the dominance frontier for nodes.
I think they are just implementation details, anyone can find out an algorithm to achieve these goals (though might be not very efficient), and there exists some library for out-of-the-box use. If you are implementing your own compiler and do want to implement an efficient algorithm by yourself, you may refer to the paper A Simple, Fast Dominance Algorithm.
Instead, this article will focus on:
- What is the dominance frontier and why do we need it?
- Giving examples on how to use the dominance frontier to convert memory access to register operation, especially insert phi nodes.
The problem
When compiling a high-level language function into LLVM IR, a variable may be used as l-value in different branches:
int f(int a, int b) {
int c;
if (a < b) {
c = a;
} else {
c = b;
}
return c;
}
Which can be translated into some IR1 like this:
fn f(i32 %a, i32 %b) -> i32 {
%c.addr = alloca i32
blt %a, %b, if_taken, if_not_taken
if_taken:
store %a, %c.addr
j if_end
if_not_taken:
store %b, %c.addr
if_end:
%c = load %c.addr
ret %c
}
This IR is correct, but you may see that to prevent assigning to c
in two different basic blocks, which makes it no longer a valid SSA, we "cheat" by "leaking" it to the memory and using load
and store
to access it. You may do this for every local variable you faced when translating a high-level language to SSA-like IR, I can promise the functions will work as expected.
However, memory access is always much slower than register access. So we do want to try to move those in-memory variables to the registers2 and keep it an SSA meanwhile.
So those crazy computer scientists invented the infamous phi
node.
PHINode - The PHINode class is used to represent the magical mystical PHI node, that can not exist in nature, but can be synthesized in a computer scientist's overactive imagination. —— LLVM project, comment of the PHINode class
With phi
node, we can make the ir into:
fn f(i32 %a, i32 %b) -> i32 {
%c.addr = alloca i32
blt %a, %b, if_taken, if_not_taken
if_taken:
%c0 = %a
j if_end
if_not_taken:
%c1 = %b
if_end:
%c = phi if_taken.%c0, if_not_taken.%c1
ret %c
}
%c = phi if_taken.%c0, if_not_taken.%c1
means %c
equals %c0
or %c1
, depending which basic block the control flow comes from.
So what a compiler needs to do is find out a way to generate this kind of phi node.
A dumb way is, for each basic block that has multiple predecessors, replace each load
with a phi node, and search along the predecessor chain for the source.
However this algorithm is very inefficient, luckily, there's an alternative, efficient way to archive this goal.
Where to put phi
nodes ——— Dominance Frontier
Let's re-consider where we need to put a phi node.
We can start with a trivial judgment:
For an existing load
statement, if the last store
statement can be in two (or more) different basic blocks, then we have to insert a phi node for it3.
Then the problem is how to judge whether "the last store
statement can be in two (or more) different basic blocks".
By using a control-flow graph, we can express this as:
There exist two different paths from the entry block to the basic block the load
statement is in, and the last store
s on these paths are different.
Let's switch to the store
's perspective:
A phi node needs to be placed for this store
in some basic block that contains a load
to the same address when:
- There exists a path that comes from the entry, passes the basic block this
store
is in, and finally reaches the basic block theload
is in. - There also exists a path that comes from the entry, but not passing the basic block this
store
is in and reaches the basic block theload
is in.
And obviously, the phi node should be placed at the basic node where these two paths meet with each other.
Such a basic block is exactly one of the elements in the dominance frontier of the basic block the store
statement is in.
Note that newly generated phi nodes will create a new register, which is necessary to be considered (regarded as a store
) when trying to insert other phi nodes. You may also do this by calculating the dominant frontier closure of the basic block of the store
basic block.
So the process of finding places to insert phi
nodes is simple:
// return a map which stands for:
// variable -> basic blocks which need to insert a phi node for this variable
fn phi_insert_positions(function: &Function) -> Map<Variable, Set<&BasicBlock>> {
let mut result = Map::new();
for variable in function.alloca_variables() {
let blocks_contain_store = function.blocks
.filter(|block| block.contains_store_to(variable));
let mut blocks_to_consider = blocks_contain_store.clone();
while !blocks_to_consider.empty() {
let considering_block = block_to_consider.pop();
for dominator_frontier in considering_block.dorminate_frontiers() {
result[variable].insert(dominator_frontier);
if !blocks_contain_store.contains(dominator_frontier) {
blocks_to_consider.push(dominator_frontier);
}
}
}
}
result
}
Example
Here is a simple example, let's find out where to insert phi nodes for the variable a
:
So initially we need to consider bb1
, bb2
, bb3
, bb5
and bb6
, which contain store
statements to a
.
For bb1
, its domination frontiers set is empty, so nothing to do for it.
For bb2
, its domination frontiers set is {bb2
, bb8
}, note bb2
itself is in the set, because we can reach bb2
in the path Entry
→ bb1
→ bb2
, which doesn't pass bb2
, and in the path Entry
→ bb1
→ bb2
→ bb3
→ bb2
, which passes bb2
.
So we insert phi
nodes in bb2
and bb8
:
Note we neither fill the source value for phi
nodes nor remove the load
and store
statements here, we'll talk about these operations later.
Here bb8
is not in our blocks_contain_store
, we need to add it to the consideration list. So our remaining consider list is [bb3, bb5, bb6, bb8]
.
bb3
's domination frontiers set is also {bb2
, bb8
}, so nothing meaningful is done here.
bb5
's domination frontiers set is {bb7
}, so we add a phi
node to bb7
:
Also, bb7
needs to be pushed to the consideration list.
Now the consideration list is [bb6, bb8, bb7]
.
bb6
's domination frontiers set is also {bb7
}, so nothing to do.
bb8
's domination frontier is empty, also nothing to do.
bb7
's domination frontier is {bb8
}, which is already considered, so nothing to do.
After this process, we have found all places which need to put a phi
node:
Decide which value to use4
Now we should decide the "sources" of the phi
nodes, and in the meantime, we can also
- remove the
load
s to the variable and replace it with the values generated byphi
orstore
. - remove the
store
s.
We can do this by doing DFS over the control flow graph, and keeping track of the current value of each variable.
enum Value {
Constant(i64),
Register(Register),
}
// decide a variable's current value with a variable value stack
fn decide_variable_value(
variable: &Variable,
current_variable_value: &Stack<Map<Variable, (&BasicBlock, Value)>>,
) -> (&BasicBlock, Value) {
for frame in current_variable_value.rev() {
if let Some(value) = frame.get(variable_name) {
return value;
}
}
}
fn decide_values_start_from(
function: &mut Function,
block: &mut BasicBlock,
visited: Set<&BasicBlock>,
current_variable_value: &mut Stack<Map<Variable, (&BasicBlock, Value)>>,
) {
// we presume phi nodes have already inserted before calling this function
// now we should insert phi sources
for phi in &mut block.phis {
let (current_value_from, current_value) = decide_variable_value(phi.variable, current_variable_value);
phi.from.insert((current_value_from, current_value));
current_variable_value.top()[phi.variable] = (block, phi.to);
}
// it's necessary to revisit a visited block when inserting phi nodes
// so we need to put the visited check after the phi source putting
if visited.contains(block) {
return;
}
visited.insert(block);
for statement in block {
match statement {
Load { to, from_variable, .. } => {
let (_, load_value) = decide_variable_value(from_variable, current_variable_value);
block.remove(statement);
function.replace(to, load_value);
},
Store { value, to_variable, .. } => {
current_variable_value.last_mut()[to_variable] = value;
block.remove(statement);
},
Branch { success_block, fail_block, .. } => {
current_variable_value.push(Map::new());
decide_values_start_from(function, success_block, visited, current_variable_value);
current_variable_value.pop();
current_variable_value.push(Map::new());
decide_values_start_from(function, fail_block, visited, current_variable_value);
current_variable_value.pop();
}
Jump { to_block, .. } => {
decide_values_start_from(function, to_block, visited, current_variable_value);
},
_ => (),
}
}
}
Example
We still use the control flow graph mentioned in the phi
-inserting process, this time I'll add some statements that "use" the variable a
to show the variable value replacement result.
We start the DFS process from bb1
, currently, the value stack of a is []
.
When visiting bb1
, we meet up with a store
that set a
to 1
, so after visiting bb1
, the value stack of a
is [(bb1, 1)]
. And the store
statement is removed.
bb1
ends with a Branch
(didn't show on the graph, but you can tell that because the node has two outgoing edges).
First, we visit bb2
, and we'll come back to bb4
later.
In bb2
we first met up with a phi
node, so we add the current value of a
(1, from bb1
) to its from
.
Then we meet a load
, which put the value of a
to a0
, since a
is currently 1
, we can remove this load
and use 1
to replace all occurrences of a0
, this is just what we want to do for the next statement.
And then we have a store
, which updates the value stack to [(bb1, 1), (bb2, %t0)]
.
After all these things, we now have:
Then we go to bb3
, similar to bb2
, we can replace %a1
with %t0
, and update the value stack to [(bb1, 1), (bb2, %t0), (bb3, %t1)]
.
And then we are led back to bb2
, since bb2
is already visited, we just need to update phi
's from with the current value of a
now.
And then we come back to bb3
and take another branch, which leads us to bb8
, and make it:
(Remember the value stack becomes [(bb1, 1), (bb2, %t0), (bb3, %t1), (bb8, %a_bb8)]
after visiting the phi
statement.)
Then we go back to visit bb4
from bb1
, and then bb5
, bb7
and bb6
.
Finally, we'll get:
Now all load
s and store
s for variable a
are eliminated, and we no longer need alloca
for variable a
. And the function is ready for further optimization.
Summary
This article introduces how to promote variables in memory to register, ie. eliminate load
s and store
s and insert phi
s in SSA form IR. And give a detailed example to show how to do it.
This is an LLVM-like IR, alloca
here means we will save some space on the stack, and we can store
values into this space and load
from it, I assume every other thing is simple enough to be understood by anyone who knows what SSA is.
Though these registers are just logical, i.e. they can still be leaked to the stack when doing codegen, optimizing them into logical registers can engage these variables in further optimization and make the codegen process much easier.
Note the phi node doesn't have to be put in the basic block the load
statement is in, we can put it in a basic block that after this store but dominate this load
.
LLVM calls this process "renaming".
Here we have a good example of why mem2reg is important: obviously, t0
can be replaced with constant value 2
, which is hard to be discovered before mem2reg.