Randomized Outcomes Refuter

CategoriesValidation , Statistics , Causal Effect

Randomizing the outcomes allows quantification of the chance of observing the effect by chance.

NOTE: Please read this article for a detailed guide to refutation and statistical significance testing in DoWhy, a Python library used in Causal Wizard.

The importance of Refutation

Refutation is a key concept in causal inference, which refers to the process of testing a hypothesis by attempting to prove it false. One way to do this is by using a randomized outcome. 

Falsifying or refuting an outcome should not been seen as a disappointment:

"The number of scientific papers published every year continues to increase, but scientific knowledge is not progressing at the same rate. Here we argue that a greater emphasis on falsification – the direct testing of strong hypotheses – would lead to faster progress by allowing well-specified hypotheses to be eliminated."

Strict refutation helps to ensure - but does not guarantee - that results are sound and trustworthy.

How Randomized Outcomes Refuter works

Randomizing the outcomes should destroy any causal effect, because the outcome is no longer affected by the treatment at all.

Causal Wizard adopts a non-parametric (i.e. not needing assumptions about data distribution) statistical significance test of the core causal result by repeatedly permuting the outcomes and re-fitting the model - this is known as the bootstrap method. Permutation is used to do this (rather than creating "random" outcomes) because it's an easy way to generate a set of statistically realistic outcome values.

We then look at how often a causal effect as strong as the original estimate is obtained from models fitted to these datasets with no causal effect. If an equally strong causal effect is very rare with permuted outcomes, this suggests there was a causal effect in the original data. The frequency of effects as strong as the original causal effect being observed under the condition of randomized outcomes is used to generate a p-value.