Categories → Data , Statistics , Variables , Causal Effect , Method
Controlling for a variable means including it as a covariate in a statistical model to account for its potential influence on the outcome of interest.
Controlling for and conditioning on a variable are two statistical techniques used to address confounding variables, which can lead to inaccurate estimates of causal effects.
Controlling for a variable involves including the variable as a covariate in the statistical model. This method adjusts for the potential influence of the confounding variable on the outcome of interest. For example, if we want to estimate the effect of a new drug on blood pressure, we may need to control for age, as older people tend to have higher blood pressure regardless of the drug they take.
Conditioning on a variable involves stratifying the data according to the values of the variable and analyzing each stratum separately. This method effectively eliminates the confounding variable by restricting the analysis to a subset of the data. For example, we could condition on sex and estimate the effect of the drug separately for men and women.
Both controlling for and conditioning on variables can be useful in different scenarios, depending on the research question and the available data. However, it is important to note that these methods assume that the confounding variable has been correctly identified and measured. If there are unmeasured confounders, then these techniques may not fully address the issue of confounding.
The image below shows three common situations in a Causal Diagram: Confounding variables, Collider variables and Mediator variables. Note the direction of the red arrows, which vary between each graph:
Causal Wizard displays all variable types after Identification, which is triggered by the Check process. Causal Wizard will handle each type of variable appropriately, but here's the intuition behind this:
Do control for confounders.
Confounding variables affect both the Treatment and Outcome, directly or indirectly.
It is generally important to control for confounding variables to reduce bias and estimate effects accurately. However, there are situations where controlling for other types of variable actually creates or increases bias!
Do not control for colliders.
A collider variable is affected by both Treatment and Outcome, directly or indirectly, and is not on the Causal path between them. Conditioning on a collider variable can induce spurious associations and thereby bias the estimated causal effect. Therefore, controlling for a collider variable can lead to biased results.
Do not control for mediators.
A Mediator variable lies on the causal path between the Treatment and Outcome. It mediates (modifies) the causal effect of the Treatment on the Outcome. Controlling for a mediating variable can block the causal effect you want to estimate. This is obviously inappropriate, and at best creates bias.
Do not over-control.
Overcontrolling occurs when a variable is controlled for unnecessarily, even though it is not a confounding variable or related to the causal relationship of interest. Overcontrolling can create bias, by artificially reducing variation in the treatment variable and forcing the model to learn a less generalised form of the relationship.
Controlling for variables with rare values has pros and cons.
If the variable you want to control for has some rare values, or limited variation, controlling for it may yield poor models that do not generalise well (are overfitted), and / or biased.
However, a confounder variable may have rare values.
It is not possible to control for unobserved confounders - because they are not observed!
These variables are not present in your data, but in the Causal Diagram they are confounders. If you cannot find a suitable identification method due to unobserved confounders, you can remove them from the graph; however, note that this creates the possibility that to some extent the effect is really due to the unobserved confounders rather than the observed Treatment.
Document unobserved confounders which you suspect may exist but aren't in the Causal Diagram. You might exclude them because: