Tutorial: Fixed Effects models

CategoriesStudy Design , Tutorial

This tutorial will demonstrate the Fixed-Effects models available in Causal Wizard with 3 examples.

Tutorials

To illustrate and explain the Fixed-Effects models available in Causal Wizard, we will work through 3 of the examples described in Matheus Facure's excellent Python Causality Handbook - "Causal Inference for the Brave and True". The datasets for each method are pre-loaded in Causal Wizard for you to use. Just create a new Study and select the relevant Dataset.

  1. Example 1 is from Chapter 14, titled "Visualizing Fixed Effects". It shows how a simple regression fit to the data would be misleading, and produce an effect with the wrong sign! We then show how including the fixed-effects of each city, we are able to recover the correct effect.
  2. Example 2 is from Chapter 24 - "The promise of Panel Data". This example describes a company rolling out a new feature to users in different cities at different times, and trying to work out whether the feature increases sales - and by how much.
  3. Example 3 is from Chapter 13, titled the "DiD Estimator". It is a very simple introduction to the Difference-in-Differences method, but we will use it to show how DiD can be obtained with Causal Wizard, using the Fixed-Effects models. In Chapter 13 Matheus Facure also derives and demonstrates the equivalence between the OLS regression method used in Causal Wizard and the literal "difference in differences" calculation, which gave the method its name.

Essential Concepts

To understand the methods and ideas explored in these tutorials, it would help to read:

To summarize, Fixed-Effects models are commonly used in Econometrics to conduct a "quasi-experiment" to obtain a causal effect estimate, or to establish a cause-and-effect relationship.

Usually, several Entities (maybe individuals, but often groups) are observed over a period of time, resulting in multiple measurements of each Entity or group. This is a more generalized form of the popular Difference-in-Differences (DiD) technique, where the number of groups and time periods is relaxed from 2, to any number. Given this experiment design, the data is typically in Panel Data format.

Example 1: Visualizing Fixed-Effects

Introduction & objectives

Imagine you wanted to explore the effect of Treatment: "marketing costs" on Outcome: "purchases" across four cities where you have shops.

This example utilizes a synthetic dataset to illustrate how misleading a simple regression model can be. Simply fitting a regression line to all the data predicts that higher marketing spend causes fewer purchases! What?!

But this picture is misleading, because as we can see, the data is grouped into four cities, and within each city, higher marketing spend causes more purchases!

If we add a "fixed-effect" to the model for each city (city indicated by dot colour), the resulting regression model predicts these straight purple lines, which indicate the effect of marketing spend has reversed:

This example shows that Entity Fixed-Effects (in this case, the Entities are cities) improve the regression model to more accurately predict purchases given marketing spend. In fact, this is an example of Simpson's Paradox.

We can reproduce the same result in Causal Wizard.

Steps

  1. Click Studies in the Menu bar
  2. Click the Create button
  3. Enter any name you like, such as "FE example 1"
  4. Select the Dataset named "Tutorial FE: Toy Panel" and press Create
  5. If you want to view or analyse the data itself, click the Data tab. Otherwise, skip this step.
  6. Since we want to use the Fixed Effects models, change Method to Panel Data Methods
  7. In this case, our Treatment variable is a continuous numerical value (marketing costs), so change the experiment Design to Continuous, numerical Treatment
  8. Set Treatment to mkt_costs. This tells Causal Wizard that this is our Treatment variable.
  9. Set Outcome to purchase. Causal Wizard will be estimating the effect of Marketing Costs on Purchase.
  10. In the Define Intervention section, enter 2 as the lower value and 5 as the upper value. Athough we have a continuous, numerical Treatment rather than a Binary (case-control) study Design, we can still obtain counterfactual estimates at these two set-points for the Treatment variable
  11. Set Entity grouping to city. The fixed-effects of city will be accounted for in the model separately to the effect of Treatment.
  12. Leave Time as None. There is no time concept in this dataset. The other examples do consider time.
  13. Leave Other covariates empty.
  14. Press Check to request Causal Wizard to review your setup.
  15. If you have any errors, check the steps above carefully. 
  16. Otherwise, you should be offered one model - the Fixed Effects - Linear Regression estimator. Press Calculate.
  17. After a few seconds, you should be offered a new Result to view. Click View Result.

Reviewing the result

Causal Wizard provides a range of analyses of your data and model, but scroll down to the plot of predicted vs actual outcomes - in this case, grouped by City:

This plot looks a lot like the correct model in the second image from Matheus Facure's original article! Great.

  • Black series indicate actual data, with a different symbol for each Entity (city)
  • Red series are predictions of the same data. Note these are similar to the purple lines in Matheus' correct result.
  • The green and yellow series are counterfactual results, illustrating predictions for each city at our two Treatment setpoints. The figure above has setpoints at 1.5 and 4.1. These allow us to visualise model behaviour at specific Treatment levels.

There are some other features of the Result to take notice of:

  • Counterfactual outcomes: This table contains the result of setting the Treatment value to the two setpoints for all samples, numerically.
  • Validation and refutation tests: This table contains the results of relevant statistical tests applied to the results and model. In a Fixed-Effects model, this will typically include the z-statistic (for each input variable aka model coefficient) and the F-statistic (for the model overall).

Challenge

What would happen if you change Entity grouping to None? (Tip: The model would behave similarly to the first regression model displayed in the introduction - the incorrect one).

Example 2: The Promise of Panel Data

Introduction & objectives

This example creates a Two-Way Fixed Effects (TWFE) model with both Entities and Time. The experiment Design is Binary (Case/Control groups). From the original article, the problem is described as follows (emphasis added):

"Panel data methods are often used in government policy evaluation, but we can easily make an argument about why they are also incredibly useful for the (tech) industry. Companies often track user data across multiple periods of time, which results in a rich panel data structure. Not only that, sometimes experimentation is not possible, so we have to rely on other identification strategies. To explore that idea further, let’s consider a hypothetical example of a young tech company that tracks the number of people that installed its app across multiple cities.

At some point in 2021, the tech company launched a new feature in their app. It now wants to know how many new users that feature brought to the company. The rollout was gradual. Some cities got the feature in 2021-06-01. Others, in 2021-07-15. The full rollout to the rest of the cities only happens in 2022. Since our data only goes up until 2021-07-31, this last group can be considered the control group. In causal inference terms, rolling out this feature can be seen as the treatment and number of installs can be seen as the outcome. We want to know the treatment effect on the outcome, that is, the effect of the new feature on the number of installs."

In the data provided, the variable treat indicates whether users in a city received the feature (1 = yes, 0 = no).

Steps

  1. Click Studies in the Menu bar
  2. Click the Create button
  3. Enter any name you like, such as "FE example 1"
  4. Select the Dataset named "Tutorial FE: Installs" and press Create
  5. If you want to view or analyse the data itself, click the Data tab. Otherwise, skip this step.
  6. Since we want to use the Fixed Effects models, change Method to Panel Data Methods
  7. In this case, our Treatment variable is binary (0/1) named "treat", so do not change the experiment Design.
  8. Set Treatment to treat. This tells Causal Wizard that this is our Treatment variable.
  9. Set Outcome to installs. Causal Wizard will be estimating the effect of treat on installs.
  10. In the Define Intervention section, click Identify Groups. A popup dialog will appear. Change the Treatment Data Type to Categorical. Causal Wizard will "sniff" your data and suggest values 0 for Control and 1 for Treated groups. Click Save.
  11. Set Entity grouping to unit. The fixed-effects of each individual unit will be accounted for in the model separately to the effect of Treatment.
  12. Set Time periods to date. This will be a Two-Way Fixed-Effects model (Entities and Time).
  13. Leave Other covariates empty.
  14. Press Check to request Causal Wizard to review your setup.
  15. If you have any errors, check the steps above carefully. 
  16. Otherwise, you should be offered one model - the Fixed Effects - Linear Regression estimator. Press Calculate.
  17. After a few seconds, you should be offered a new Result to view. Click View Result.

Reviewing the result

In this case, the data is synthetic and we know the true causal effect, which is 1. In the first section of your results, called Findings, you should see this reported - verify the result is correct. As in the previous tutorial, we should also check our validation and refutation tests.

Since this example includes Time, we next want to have a look at section Outcomes over Time. This includes a plot, which at first will look very busy. That's because we are currently modelling individual fixed effects for each of the many stores.

These plots are generated using Plotly.js. We can select series by clicking in the Legend; double-clicking a series will cycle through hiding it, showing it, and showing all series. By selecting only series E.76 (i.e. unit 76) we can see 4 series for this unit:

  • Actual observations (black)
  • Predicted observations (red dashed line)
  • Predictions given treatment is Control (0) - green
  • Predictions given treatment is Treated (1) - yellow/tan

In the case of Unit 76, we can see that the predicted values are a good match to the actual values. That's good. Additionally, we can see that after July 4th, both predictions and observations rise to the Treated level, reflecting the fact that this unit was "treated" after this date:

Additional Analysis

There are numerous other results to review, but we can also simplify the plot by grouping by City instead of Unit. Let's do that now.

  1. Go back to your Study
  2. Change Entity Grouping to cohort (the name of the City column)
  3. Press Check to request Causal Wizard to review your setup.
  4. If you have any errors, check the steps above carefully. 
  5. Otherwise, you should be offered one model - the Fixed Effects - Linear Regression estimator. Press Calculate.
  6. After a few seconds, you should be offered a new Result to view. Click View Result.

This will produce a plot more similar to the one in Matheus Facure's book, but the estimated causal effect will be a little different, because we're not using Unit fixed-effects anymore - instead, we fit a cohort (city) fixed effect to all the units in the city. You could consider this model is "wrong" because we aren't controlling for the same effects, but it's a good idea to explore a range of related model configurations to ensure they all behave as you'd expect. 

However, the plot is now simplified because one series is plotted for each Entity - and we can confirm it matches the one in Matheus Facure's article. This plot is easy to interpret - we can see the effect of treatment on each cohort (city). Causal Wizard will always plot the Entity column, but you can create additional plots of your own if the standard ones don't present the data the way you'd like.

 

Example 3: The Difference-in-Differences (DiD) estimator

Introduction & objectives

This third example is actually the simplest. It is introducted in Chapter 13 to show the equivalence of DiD and regression under certain conditions, namely binary treatment design, 2 time periods, and 2 groups of entities with one group treated in the second time period. We will use it for the same purpose, to show how a DiD result can be obtained in Causal Wizard.

The data has 3 columns:

  • deposits: The outcome we want to estimate.
  • poa: The entity group. Value is 1 if sample is from Porto Alegre (POA) and 0 if a sample is from Florianopolis, the other city.
  • jul: The month indicator: Either 0 (May: pre-intervention) or 1 (July: Post-intervention).
  • treated: We have added an "interaction term" column which is simply poa * jul i.e. 1 iff post-intervention and city Porto Alegre. These samples are the only ones which are treated. If you use Causal Wizard for DiD on your data, you'll also need to add an interaction term like this.

With that data, we want to estimate the effect of Treatment on Outcome: deposits.

Steps

  1. Click Studies in the Menu bar
  2. Click the Create button
  3. Enter any name you like, such as "FE example 1"
  4. Select the Dataset named "Tutorial FE: Billboard Impact" and press Create
  5. If you want to view or analyse the data itself, click the Data tab. Otherwise, skip this step.
  6. Since we want to use the Fixed Effects models, change Method to Panel Data Methods
  7. In this case, our Treatment variable is binary (0/1) named "treat", so do not change the experiment Design.
  8. Set Treatment to treated. This tells Causal Wizard that this is our Treatment variable.
  9. Set Outcome to deposits. Causal Wizard will be estimating the effect of treated on deposits.
  10. In the Define Intervention section, click Identify Groups. A popup dialog will appear. Change the Treatment Data Type to Categorical. Causal Wizard will "sniff" your data and suggest values 0 for Control and 1 for Treated groups. Click Save.
  11. Set Entity grouping to pos. The fixed-effects of each individual unit will be accounted for in the model separately to the effect of Treatment.
  12. Set Time periods to jul. This will be a Two-Way Fixed-Effects model (Entities and Time).
  13. Leave Other covariates empty.
  14. Press Check to request Causal Wizard to review your setup.
  15. If you have any errors, check the steps above carefully. 
  16. Otherwise, you should be offered one model - the Fixed Effects - Linear Regression estimator. Press Calculate.
  17. After a few seconds, you should be offered a new Result to view. Click View Result.

Reviewing the result

In this case we have a "correct" result from Matheus Facure's article, which we expect to obtain: 6.52. This is the value of the interaction coefficient (treated variable) he obtained from both the simple DiD calculation, and OLS regression.

Your result should be somewhere between 5 and 7, probably not quite right. Why is that? 

As part of Validation, Causal Wizard automatically assumes you want to keep some data for generalization testing, by default a random 10% of your data. To reproduce the original result exactly, we must use all our data for training the mode.

  1. Go back to your Study
  2. Click Advanced Options at the bottom of the page. Some additional controls will appear.
  3. Set "Use % of all rows" to 0.
  4. Ensure "Use first N rows" is 0. These two settings will disable the test set.
  5. Press Check to request Causal Wizard to review your setup.
  6. If you have any errors, check the steps above carefully. 
  7. Otherwise, you should be offered one model - the Fixed Effects - Linear Regression estimator. Press Calculate.
  8. After a few seconds, you should be offered a new Result to view. Click View Result.

The new causal effect should be 6.52 (or very close to it).

Parallel Trends plot

One of the important assumptions to check in a DiD study is the parallel trends assumption. This assumption is that, absent treatment, both/all Entity groups would have experienced the same change in Outcome. The Outcomes over Time plot is intended to visualise this and help you confirm it. The plot has 4 series per Entity:

  • Actual observations (black)
  • Predicted observations (red dashed line)
  • Predictions given treatment is Control (0) - green
  • Predictions given treatment is Treated (1) - yellow/tan

In the case of DiD, we want to confirm that the trend for the Treated entity (city of POA) is similar to the trend for the Control entity (city FL). Since we only have 2 time points, we can't verify this - but in your data, you may have multiple time periods and the Causal Wizard model will work just as well. You'll see all time points in the chart.

The plot shows counterfactuals and predictions so you can compare model predictive behaviour to observations, and visualize the effect of your intervention (treatment).

Caution: All plot results are produced from 1000 random samples from your data, or the entire data - whichever is smaller. This means that there may be small variations in all plots due to sampling. This is necessary to avoid copying large datasets entirely to your browser to render results.

Above: The outcome over time plot shows the prediction of city PoA (entity=1) deposits increasing from the green (control) line to the yellow/tan treated line over time. The actual data matches this behaviour closely. In contrast, the city FL (entity = 0) red prediction line remains equal to the Control (green) line.