Categories → Study Design , Tutorial
This tutorial will demonstrate the Fixed-Effects models available in Causal Wizard with 3 examples.
To illustrate and explain the Fixed-Effects models available in Causal Wizard, we will work through 3 of the examples described in Matheus Facure's excellent Python Causality Handbook - "Causal Inference for the Brave and True". The datasets for each method are pre-loaded in Causal Wizard for you to use. Just create a new Study and select the relevant Dataset.
To understand the methods and ideas explored in these tutorials, it would help to read:
To summarize, Fixed-Effects models are commonly used in Econometrics to conduct a "quasi-experiment" to obtain a causal effect estimate, or to establish a cause-and-effect relationship.
Usually, several Entities (maybe individuals, but often groups) are observed over a period of time, resulting in multiple measurements of each Entity or group. This is a more generalized form of the popular Difference-in-Differences (DiD) technique, where the number of groups and time periods is relaxed from 2, to any number. Given this experiment design, the data is typically in Panel Data format.
Introduction & objectives
Imagine you wanted to explore the effect of Treatment: "marketing costs" on Outcome: "purchases" across four cities where you have shops.
This example utilizes a synthetic dataset to illustrate how misleading a simple regression model can be. Simply fitting a regression line to all the data predicts that higher marketing spend causes fewer purchases! What?!
But this picture is misleading, because as we can see, the data is grouped into four cities, and within each city, higher marketing spend causes more purchases!
If we add a "fixed-effect" to the model for each city (city indicated by dot colour), the resulting regression model predicts these straight purple lines, which indicate the effect of marketing spend has reversed:
This example shows that Entity Fixed-Effects (in this case, the Entities are cities) improve the regression model to more accurately predict purchases given marketing spend. In fact, this is an example of Simpson's Paradox.
We can reproduce the same result in Causal Wizard.
Steps
Reviewing the result
Causal Wizard provides a range of analyses of your data and model, but scroll down to the plot of predicted vs actual outcomes - in this case, grouped by City:
This plot looks a lot like the correct model in the second image from Matheus Facure's original article! Great.
There are some other features of the Result to take notice of:
Challenge
What would happen if you change Entity grouping to None? (Tip: The model would behave similarly to the first regression model displayed in the introduction - the incorrect one).
Introduction & objectives
This example creates a Two-Way Fixed Effects (TWFE) model with both Entities and Time. The experiment Design is Binary (Case/Control groups). From the original article, the problem is described as follows (emphasis added):
"Panel data methods are often used in government policy evaluation, but we can easily make an argument about why they are also incredibly useful for the (tech) industry. Companies often track user data across multiple periods of time, which results in a rich panel data structure. Not only that, sometimes experimentation is not possible, so we have to rely on other identification strategies. To explore that idea further, let’s consider a hypothetical example of a young tech company that tracks the number of people that installed its app across multiple cities.
At some point in 2021, the tech company launched a new feature in their app. It now wants to know how many new users that feature brought to the company. The rollout was gradual. Some cities got the feature in 2021-06-01
. Others, in 2021-07-15
. The full rollout to the rest of the cities only happens in 2022. Since our data only goes up until 2021-07-31
, this last group can be considered the control group. In causal inference terms, rolling out this feature can be seen as the treatment and number of installs can be seen as the outcome. We want to know the treatment effect on the outcome, that is, the effect of the new feature on the number of installs."
In the data provided, the variable treat indicates whether users in a city received the feature (1 = yes, 0 = no).
Steps
Reviewing the result
In this case, the data is synthetic and we know the true causal effect, which is 1. In the first section of your results, called Findings, you should see this reported - verify the result is correct. As in the previous tutorial, we should also check our validation and refutation tests.
Since this example includes Time, we next want to have a look at section Outcomes over Time. This includes a plot, which at first will look very busy. That's because we are currently modelling individual fixed effects for each of the many stores.
These plots are generated using Plotly.js. We can select series by clicking in the Legend; double-clicking a series will cycle through hiding it, showing it, and showing all series. By selecting only series E.76 (i.e. unit 76) we can see 4 series for this unit:
In the case of Unit 76, we can see that the predicted values are a good match to the actual values. That's good. Additionally, we can see that after July 4th, both predictions and observations rise to the Treated level, reflecting the fact that this unit was "treated" after this date:
Additional Analysis
There are numerous other results to review, but we can also simplify the plot by grouping by City instead of Unit. Let's do that now.
This will produce a plot more similar to the one in Matheus Facure's book, but the estimated causal effect will be a little different, because we're not using Unit fixed-effects anymore - instead, we fit a cohort (city) fixed effect to all the units in the city. You could consider this model is "wrong" because we aren't controlling for the same effects, but it's a good idea to explore a range of related model configurations to ensure they all behave as you'd expect.
However, the plot is now simplified because one series is plotted for each Entity - and we can confirm it matches the one in Matheus Facure's article. This plot is easy to interpret - we can see the effect of treatment on each cohort (city). Causal Wizard will always plot the Entity column, but you can create additional plots of your own if the standard ones don't present the data the way you'd like.
Introduction & objectives
This third example is actually the simplest. It is introducted in Chapter 13 to show the equivalence of DiD and regression under certain conditions, namely binary treatment design, 2 time periods, and 2 groups of entities with one group treated in the second time period. We will use it for the same purpose, to show how a DiD result can be obtained in Causal Wizard.
The data has 3 columns:
With that data, we want to estimate the effect of Treatment on Outcome: deposits.
Steps
Reviewing the result
In this case we have a "correct" result from Matheus Facure's article, which we expect to obtain: 6.52. This is the value of the interaction coefficient (treated variable) he obtained from both the simple DiD calculation, and OLS regression.
Your result should be somewhere between 5 and 7, probably not quite right. Why is that?
As part of Validation, Causal Wizard automatically assumes you want to keep some data for generalization testing, by default a random 10% of your data. To reproduce the original result exactly, we must use all our data for training the mode.
The new causal effect should be 6.52 (or very close to it).
Parallel Trends plot
One of the important assumptions to check in a DiD study is the parallel trends assumption. This assumption is that, absent treatment, both/all Entity groups would have experienced the same change in Outcome. The Outcomes over Time plot is intended to visualise this and help you confirm it. The plot has 4 series per Entity:
In the case of DiD, we want to confirm that the trend for the Treated entity (city of POA) is similar to the trend for the Control entity (city FL). Since we only have 2 time points, we can't verify this - but in your data, you may have multiple time periods and the Causal Wizard model will work just as well. You'll see all time points in the chart.
The plot shows counterfactuals and predictions so you can compare model predictive behaviour to observations, and visualize the effect of your intervention (treatment).
Caution: All plot results are produced from 1000 random samples from your data, or the entire data - whichever is smaller. This means that there may be small variations in all plots due to sampling. This is necessary to avoid copying large datasets entirely to your browser to render results.
Above: The outcome over time plot shows the prediction of city PoA (entity=1) deposits increasing from the green (control) line to the yellow/tan treated line over time. The actual data matches this behaviour closely. In contrast, the city FL (entity = 0) red prediction line remains equal to the Control (green) line.