Model Selection

Categories → Causal Inference , Study Design , Method

Tips on choosing an appropriate model

Fixed Effects models for Panel Data

Causal Wizard provides Fixed-Effects regression models if your data is in Panel Data format. These methods are less flexible than the various Causal Machine Learning methods available via the Potential Outcomes Framework (described below).

To enable the Fixed-Effects models for study designs like Difference-in-Differences, follow the instructions in our Study Method and Design article. Our default Method and Design will provide models from the DoWhy library, via the Potential Outcomes Framework - described below.

Potential Outcomes Framework (default)

If you use our Causal Diagram editor you'll have the option to Identify a causal effect and estimate it using the Potential Outcomes framework, implemented in DoWhy.

Causal Wizard currently supports the default set of methods in DoWhy. These are listed below, along with considerations when using them. More advanced models will be offered in future.

In general, Causal Wizard will not offer methods that are not compatible with the identified estimand. However, just because a method is compatible, doesn't mean it's a good choice.

Our overall advice is to compare at least 2 methods and observe the effect on the results. We also recommend using Linear Regression where possibe as one comparison point. The exception would be when the system is clearly behaving in a nonlinear way. For nonlinear systems, you can use the Double ML method. Even in these conditions, we still recommend a linear regression baseline for comparison.

Given a choice of multiple estimand types (e.g. Backdoor and IV), we recommend to try both, but prefer Backdoor.

Backdoor Methods

The following methods are available when a backdoor estimand has been identified:

Propensity Score Methods: These are available only if one or more common-cause variables is identified. These methods are often a good choice. Follow the link to learn about their characteristics.
- Propensity Score-based Inverse Weighting
- Propensity Score Matching
- Propensity Score Stratification
Linear Regression: A good baseline when it is reasonable to expect linear behaviour from all interactions, and worth comparing to when this assumption is only slightly violated, if only to understand how important the nonlinearities are!
Generalized Linear Models (GLM): A more powerful model than Linear Regression. The type of GLM model used depends on the cardinality and type of the Outcome variable.
- If the Outcome variable is boolean (True/False) or binary (0/1), the Binomial model is used.
- If the Outcome variable is numeric, the Poisson model is used.

Instrumental Variable (IV) Methods

The following methods are available when an Instrumental Variable estimand has been identified:

Instrumental Variables method
Regression Discontinuity method

Frontdoor Methods

The following methods are available when a Frontdoor estimand has been identified:

Two-stage regression (note: All interactions must be linear; no model is currently supported when interactions are NON-linear and ONLY a Frontdoor estimand has been identified).

Matching articles

Categories