因果关系图(Causal graph, Causal DAG)

causal graphs (also known as path diagrams, causal Bayesian networks or DAGs) are probabilistic graphical models used to encode assumptions about the data-generating process.

The causal graph can be drawn in the following way. Each variable in the model has a corresponding vertex or node and an arrow is drawn from a variable X to a variable Y whenever Y is judged to respond to changes in X when all other variables are being held constant. Variables connected to Y through direct arrows are called parents of Y, or "direct causes of Y," and are denoted by Pa(Y).

简单来说就是用来描述因果关系的有向无环图。

混杂(confounding)

Let’s define confounding as any context in which the association between an outcome Y and a predictor of interest X is not the same as it would be, if we had experimentally determined the values of X.

Let X be some independent variable, and Y some dependent variable. To estimate the effect of X on Y, the statistician must suppress the effects of extraneous variables that influence both X and Y. We say that X and Y are confounded by some other variable Z whenever Z causally influences both X and Y.

简单来说就是,在研究 $X$ 对 $Y$ 的影响时,如果 $X$ 和 $Y$ 之间的关系被其他变量 $Z$ 所影响,那么 $Z$ 就是 $X$ 与 $Y$ 之间的混杂因素。

Will produce false inferences about causal effects.

存在 4 种基本的混杂情况。

Fork

Pipe / Mediator

Collider

Descendant

其中 $ConfoundVariable2$ 是 Descendant 。

去除混杂对变量间因果关系的影响

Path

忽略 DAG 中边的方向,如果能从 $X$ 到达 $Y$,那么称 $X$ 到 $Y$ 有一条路径(path)。

Backdoor path

In a causal DAG, a backdoor path is a noncausal path between treatment and outcome that remains even if all arrows pointing from treatment to other variables (the descendants of treatment) are re- moved. That is, the path has an arrow pointing into treatment.

简单来说就是有箭头指向自变量的路径。

打开/关闭 Path

如果一个路径上包含了一个 collider,那么这个路径就是关闭的,否则就是打开的。

在分析 $X$ 到对 $Y$ 的影响时,如果包含了一个打开的 backdoor path,那么这个路径就会干扰我们研究 $X$ 到 $Y$ 的(直接)影响,此时,我们需要关闭这个路径。

假设路径从 $X$ 到 $Y$, 要关闭一个正打开的路径,有两个方法:

  • 通过搜集数据、进行实验等方式让 $X$ 随机分布,这样,所有进入 $X$ 的箭头都会被打断,从而关闭路径。
  • "condition on" 路径上的 fork 和 pipe,即对 fork 和 pipe 上的变量进行控制(即分别对这个变量为不同的值时的模型进行研究),使其不再影响 $X$ 和 $Y$ 之间的关系。 对于 descendent,控制 Descendent 节点可以部分关闭路径。

关闭 Path 时,如果 Path 上有多个可以控制的节点,控制其中离 $Y$ 最近的节点对于估计 $X$ 到 $Y$ 之间的影响精度比较有利。

此外,控制 collider 会打开原本在这个 collider 上关闭的 Path。