Observed / unobserved variable

CategoriesCausal Inference , Variables , Study Design

An Observed variable is present in your data, whereas an Unobserved variable is not, and therefore its values are not available for use in inference.

In causal inference, an observed variable is a variable that can be directly measured or observed in a study or experiment. Therefore, this includes all variables included in your Dataset, as Columns.

In other contexts, it is also referred to as a measured variable or a manifest variable.

An unobserved variable may also be described as a latent variable, although this latter term also refers to variables which aren't merely not measured, but impossible to measure.

Why is observability important?

Observed variables are important in causal inference because they can be used to estimate the causal effects of one variable on another. For example, if we are interested in understanding the causal effect of smoking on lung cancer, we can measure the smoking status of individuals and the incidence of lung cancer, and use statistical methods to estimate the causal effect of smoking on lung cancer.

On the other hand, an unobserved variable (also known as a latent variable or a confounding variable) is a variable that was not, or cannot be directly measured or observed in a study or experiment. Instead, it is inferred based on its relationship with observed variables, or identified from prior domain knowledge.

Unobserved variables can often confound the relationship between two observed variables, leading to spurious or biased results. However, we can defened against this with some strategies. 

  • First, we can use domain knowledge to identify or exclude potential confounding unobserved variables. 
  • Second, we can use sensitivity analysis to estimate upper bounds on the degree of confounding that might be present.

For example, if we are interested in understanding the causal effect of exercise on heart disease, we may observe that people who exercise regularly have lower rates of heart disease. However, we cannot assume that this is a causal relationship, because there may be unobserved confounding variables that are driving both exercise behavior and heart disease risk (such as genetics, diet, or socioeconomic status). If we do not account for these unobserved variables, our estimate of the causal effect of exercise on heart disease may be biased.

Sometimes, domain knowledge can be used to avoid the need unobserved variable data, or to identify suitable proxies (alternatives).

Summary

In summary, observed variables are directly measured or observed in a study or experiment, while unobserved variables are not directly measured and may confound the relationship between observed variables. It is important to carefully consider both types of variables in causal inference to avoid bias and ensure accurate estimates of causal effects.

Related articles
In categories