Literature ARM | Joyce Rommens
Literature week 1
1.1 Hernan & Robins (2020) – Chapter 1: a definition of causal effect (9p)
In your life you have automatically used basic causal concepts. The author expects knowledge about the definition of causal
effects and the difference between association and causation. the purpose of this chapter is to introduce mathematical
notation that formalizes the causal intuition that you already possess.
Individual causal effects
Humans reason about casual effects by: Comparing (usually only mentally) the outcome when an action A is taken versus
the outcome when the action A is withheld. If the two outcomes differ, we say that the action A has a causal effect,
causative, or preventive, on the outcome. Otherwise, we say that the action A has no causal effect on the outcome. Action
A is called an intervention, an exposure, or a treatment.
For mathematical and statistical analysis, we must introduce some notation. Consider a dichotomous treatment
variable A (1: treated, 0: untreated) and a dichotomous outcome variable Y (1: death, 0: survival). In this book we refer to
variables such as A and Y variables that may have different values for different individuals as random variables.
Let Y a=1 (read Y under treatment a = 1) be the outcome variable that would have been observed under the treatment
value a = 1, and Y a=0 (read Y under treatment a = 0) the outcome variable that would have been observed under the
treatment value a = 0. Y a=1 and Y a=0 are also random variables.
Capital letters represent random variables. Lower case letters denote values of a random variable.
The treatment A has a causal effect on an individual’s outcome Y if Y a=1 ≠ Y a=0 for the individual. The
variables Y a=1 and Y a=0 are referred to as potential outcomes or as counterfactual outcomes. Some authors prefer the term
“potential outcomes” to emphasize that, depending on the treatment that is received, either of these two outcomes can be
potentially observed. Other authors prefer the term “counterfactual outcomes” to emphasize that these outcomes
represent situations that may not actually occur (that is, counter-to-the-fact situations).
For everyone, one of the counterfactual outcomes–the one that corresponds to the treatment value that the individual did
receive–is actually factual.
That is, an individual with observed treatment A equal to a, has observed outcome Y equal to his counterfactual
outcome Y a. This equality can be succinctly expressed as Y = Y A where Y A denotes the counterfactual Y a evaluated at the
value a corresponding to the individual’s observed treatment A. The equality Y = Y A is referred to as consistency.
1.2 Hernan (2002) – Causal knowledge as a prerequisite for confounding evaluation:
an application to birth defects epidemiology (8p)
In epidemiologic studies, statistical analyses are typically organized around three different sets of variables: the exposure,
the outcome, and the confounder(s). The exposure and outcome are usually determined by the causal question under
investigation. The confounders, on the other hand, are not so clearly defined; they must first be identified and then
appropriately adjusted for in the analysis. Three common approaches to confounder identification are centered on
statistical associations:
1. Automatic variable selection, such as stepwise selection. The implicit assumption underlying this approach is that,
although not all variables selected will be confounders, all important confounders will be selected.
2. Comparing of adjusted and unadjusted effect estimates. If the relative change after adjustment for certain variable(s) is
greater than 10 percent, for example, then the variable(s) is selected. Implicit in this approach is that any variable
substantially associated with an estimate change is worth adjusting for.
3. Standard rules for confounding. Checking if necessary criteria for confounding are met. Generally, it is stated that a
confounder is a variable associated with the exposure in the population, associated with the outcome conditional on the
exposure (e.g., among the unexposed), and not in the causal pathway between the exposure and the outcome. A further
refinement is to replace the second condition by the condition that the potential confounder is a causal risk factor or a
marker for a causal risk factor
All three strategies may lead to bias from the omission of important confounders or inappropriate adjustment for non-
confounders.
Two variables are statistically associated when one causes the other or when they share a common cause, or both. If one
precedes the other, the overall associations between them will have two components: a spurious (onecht) one that is due
to the sharing of common causes and another due to the causal effect. The first component produces confounding. This can
be eliminated by adjusting, stratifying or conditioning on the common cause. The presence of common causes, and
therefore of confounding, can be represented by causal diagrams known as directed acyclic graphs (DAGs). Because causes
precede their effects, these graphs are acyclic: One can never start from one variable and, following the direction of the
1
, Literature ARM | Joyce Rommens
arrows, end up at the same variable. Conditioning on a common effect of collider, creates a spurious association between
the two ‘main variables’.
Selection bias induces noncomparability or, equivalently, lack of exchangeability of the exposed and the unexposed, even if
they were comparable before the selection. Many authors use noncomparability as a synonym for confounding. We are
being careful to separate confounding due to unmeasured common causes from noncomparability induced by selection.
Conclusion
We have argued that knowledge of the causal structure is a prerequisite to accurately label a variable as a confounder.
Taken literally, this statement may impose such an unrealistically high standard on the epidemiologist that many studies
simply could not be done at all. Instead, we wish to emphasize that causal inference from observational data requires prior
causal assumptions or beliefs, which must be derived from subject-matter knowledge, not from statistical associations
detected in the data. In general, investigators should not adjust for a variable C unless they believe it may be a confounder.
At the very least, researchers should generally avoid stratifying on variables affected by either the exposure or the
outcome.
We have used causal diagrams to describe three possible sources of statistical association between two variables: cause
and effect, sharing of common causes, and calculation of the association within levels of a common effect. There is
confounding when the association between exposure and disease includes a noncausal component attributable to their
having an uncontrolled common cause. There is selection bias when the association between exposure and disease
includes a noncausal component attributable to restricting the analysis to certain level(s) of a common effect of exposure
and disease or, more generally, to conditioning on a common effect of variables correlated with exposure and disease. In
either case, the exposed and the unexposed in the study are not comparable, or exchangeable, which is the ultimate
source of the bias. Statistical criteria are insufficient to characterize either confounding or selection bias.
1.3 Hernan (2016) – Does water kill? A call for less casual causal inferences (6p)
The potential outcomes approach provides conceptual definitions and supports analytic methods for researchers interested
in producing and interpreting numerical estimates of causal effects.
Key conditions fort he idetifiability of causal effects from observational data:
1. Consistency (has two main components)
I. Sufficiently well-defined interventions
It t is impossible to provide an absolutely precise definition of a version of treatment. On the one hand, this is
problematic because, when there are multiple versions of a and different versions lead to different outcomes, causal
effects are not well defined. On the other hand, the problem is not as serious as it seems because absolute precision in
the definition of the versions of treatments is not needed. Further specification of versions of treatment is required
only until no meaningful vagueness remains. The best we can do is to specify the versions of treatment with as much
detail as we believe necessary, which is precisely what the protocols of randomized experiments do. A sufficiently well-
defined intervention needs to specify the start and end of the intervention and the implementation of its different
components over time. Forcing us to refine the causal question, until it is agreed that no meaningful vagueness
remains, is an essential contribution of quantitative counterfactual theory to science.
II. Linkage between interventions and the data
This is all about the equal sign in the consistency condition Ya = Y. Sufficiently well-defined interventions a allow us to
interpret the potential outcome Ya but not necessarily to obtain an effect estimate from a data set in which the
existing versions of treatment cannot be linked to the interventions a. One way out of this problem is to assume that
the effects of all versions of treatment are identical or at least all in the same direction. In some cases, this may be a
reasonable assumption. In other cases, however, the assumption seems to go against the available evidence. Any
scientific discussion about whether all or some versions of treatment lead to the same causal conclusion rests, again,
on expert consensus and judgment.
2. Exchangeability and positivity (together: ingorability)
The set of confounders required to achieve conditional exchangeability depends on the intervention and outcome of
interest, and the same goes for the set of confounders over which positivity is required. The formal expression of the
conditions of exchangeability and positivity (sometimes jointly referred to as ignorability) under multiple versions of
treatment are similar to that of the conditions for the identification of direct effects. We need to know the interventions of
interest for a successful emulation of a target trial that intervenes on them, that is, to adjust for confounding. Because
these other versions of treatment may not be directly manipulable, estimating their effect is not of primary interest for
those who need to make decisions about clinical or policy interventions that are available at this time.
Conclusions
2
, Literature ARM | Joyce Rommens
The goal of the potential outcomes framework is not to identify causesdor to “prove causality”, as it sometimes said.
Rather, quantitative counterfactual inference helps us predict what would happen under different interventions, which
requires our commitment to define the interventions of interest.
In summary, the potential outcomes framework does not limit the scope of the causal questions; it just makes it
transparent the interpretability and reliance on data of our effect estimates. We owe this transparency to those who will
ground their decisions on the results of our research.
1.4 Rohrer (2018) – Thinking clearly about correlations and causation: graphical
causal models for observational data (13p)
Whereas most researchers are aware that randomized experiments are considered the “gold standard” for causal
inference, manipulation of the independent variable of interest will often be unfeasible, unethical, or simply impossible. In
this article, I discuss how causal inferences based on observational data can be improved by the use of directed acyclic
graphs (DAGs), which provide visual representations of causal assumptions.
A Brief Introduction to Directed Acyclic Graphs
From observational data one can not conclude that A causes B, because there might be variables C, D, E, … that affect B
(and/or A). they might even fully account for the outcomes.
One popular way to think about DAGs is to interpret them as nonparametric SEMs (Elwert, 2013), a comparison that
highlights a central difference between DAGs and SEMs. Whereas SEMs encode assumptions regarding the form of the
relationship between the variables (i.e., by default, arrows in SEMs indicate linear, additive relationships, unless indicated
otherwise), an arrow in a DAG might reflect a relationship following any functional form (e.g., polynomial, exponential,
sinusoidal, or step function). Furthermore, in contrast to SEMs, DAGs allow only for single-headed arrows, which is why
they are called directed graphs. Sometimes, there might be a need to indicate that two variables are noncausally associated
because of some unspecified common cause, U. A double-headed arrow could be used to indicate such an association (i.e.,
A ↔ B), but this would just be an abbreviation of A ← U → B, which again contains only single-headed arrows.
Paths and elementary causal structures
From these two simple building blocks—nodes and arrows—one can visualize more complex situations and trace paths
from variable to variable. In the simplest case, a path leads just from one node to the next one; an example is the path
intelligence → income. Paths can also include multiple nodes. For example, intelligence and income are additionally
connected by the paths intelligence → educational attainment → income and intelligence → grades → educational
attainment → income. A path can also travel against the direction indicated by the arrows, as, for example, does the
following path connecting educational attainment and income: educational attainment ← grades ← intelligence → income.
Although such paths can become arbitrarily long and complex, they can be broken down into three elementary causal
structures:
- Chains: A ® B ® C.
o A path that only consists of chains can transmit a causal association. Along such a chain, variables that
are directly or indirectly causally affected by a certain variable are called its descendants; conversely,
variables that directly or indirectly affect a certain variable are considered its ancestors.
- Forks: A ® B ® C
o A path that also contains forks, such as educational attainment ← grades ← intelligence → income, still
transmits an association—but it is no longer a causal association because of the confounding variable (in
this case, intelligence).
- Inverted forks: A ® B ® C
o And a path that contains an inverted fork is blocked: No association is transmitted. For example, the
path educational attainment → income ← intelligence → grades does not transmit a correlation
between educational attainment and grades.
No way back: acyclicity
DAGs are acyclic because they do not allow for cyclic paths in which variables become their own ancestors. A variable
cannot causally affect itself. Such a feedback loop can be modeled in a DAG (to some extent) by taking the temporal order
into account and adding nodes for repeated measures.
Confounding: The Bane of Observational Data
The central problem of observational data is confounding, that is, the presence of a common cause that lurks behind the
potential cause of interest (the independent variable; in experimental settings, often called the treatment) and the
outcome of interest (the dependent variable). Such a confounding influence can introduce what is often called a spurious
correlation, which ought not to be confused with a causal effect. If we want to derive a valid causal conclusion, we need to
build a causal DAG that is complete because it includes all common causes of all pairs of variables that are already included
in the DAG. After such a DAG is built, back-door paths can be discerned. Back-door paths are all paths that start with an
3