Tutorial 1

CategoriesTutorial

Does tertiary education increase lifetime wages, and if so, how much?

Introduction

This example is loosely based on the discussion in the Python Causality Handbook, Chapter 4: Confounding Bias, by Matheus Facure. It examines the effect of tertiary
education on weekly wages.
 
We will actually model the problem in 2 different ways and compare the result, to demonstrate the importance and impact of the domain knowledge you provide when you use Causal Wizard. 

Step 1: Create a new Study

  1. Click on Studies in the menu bar.
  2. In the Studies list page, click the Create button.
  3. Give the new Study any name you like (e.g. Tutorial 1).
  4. Select the Dataset with the name beginning "Tutorial 1"
  5. Click Create and the new Study will be displayed.

Step 2: Specify Treatment and Outcome

In this case study our treatment is the variable Education (number of years' education) and the Outcome we're interested in is Weekly Wage. Specifically, we're interested in estimating the effect on wages of having more than the standard 12 years education.

Since Education is a whole number (integer) value rather than a true/false property, we need to specify a threshold to group our sample into Control (<= 12 years' education) and Treated cohorts (> 12 years' education).

Treatment and Outcome settings

  1. Click Treatment and select Education.
  2. Click Outcome and select Weekly Wage.
  3. Now specify the Control / Treated Threshold. Click the Show button next to the Threshold input.
  4. A dialog will appear, showing the distribution of years' education.
  5. In the input at the top, enter 12. The plot will update.
  6. You can now see in the plot that samples with up to 12 years' education are recognised as Controls, and samples with more years' education are Treated.
  7. Press Save and the dialog will close.

Modal dialog for setting Control/Treated threshold value.

Step 3: Draw Causal Diagram

The next step is to tell Causal Wizard how these variables interact. Following the textbook, we assume that:

  • Education affects Wages
  • Intelligence (measured as IQ) affects both Education and Wages 

Causal diagram of intelligence, education and wages.

The claim here is that people who are more intelligent (if IQ measures such a thing) are more likely to spend more time in education, but also that intelligence also directly affects wages.

However, the effect we're interested in is the effect of Education, not Intelligence. How can we separate these effects?

The simple answer is that we draw the diagram corresponding to our beliefs listed above.

  1. In the Causal Diagram panel, click somewhere near the middle. A dialog will appear. 
  2. In the Data Column selector, select Education
  3. Click Save
  4. The dialog will disappear and a node will appear in the diagram called Education.
  5. Repeat the above steps to add a node for Weekly Wage.
  6. Repeat the above steps to add a node for IQ (representing intelligence).
  7. Now we need to draw the arrows in the Causal Diagram. 
  8. Click the Draw Edges toggle button above the Causal Diagram.
  9. Click and drag an edge from Education to Weekly Wage. This represents the direct effect.
  10. Click and drag an edge from IQ to Education.
  11. Click and drag an edge from IQ to Weekly Wage.

The diagram should now look like this:

Causal diagram

Step 4: Check the setup

  1. Press the Check button at the top of the page. 
  2. If something is not right, you'll get some warning messages - follow their instructions to complete or fix the setup.
  3. If everything went as planned, you'll see a large green block which invites you to select a Model type. Select "Backdoor: Linear regression". (This tutorial won't cover model selection criteria.)
  4. Press the Calculate button below the green block.

Choose model to use for estimationNote that you can't click Calculate until all issues identified during the Check process have been resolved.

It may take a few minutes to get a result, depending how busy our servers are. Eventually, a dialog will pop up, inviting you to view the result. Click View Result.

Step 5: Reviewing the results

You can always come back to earlier results. If you decided to come back later while following this tutorial, you can find your result by:

  1. Clicking on Studies in the menu bar.
  2. In the Studies list page, click the Results button next to your "Tutorial 1" Study.
  3. You'll now see all Results for this Study. If you've followed this tutorial, you'll only have one. Click the Open button.

Each Result is a full page document. 

You an download Results as PDF files by clicking the Download button.

Let's look at the headline result. You should see something like this:

Some takeaway points:

  • First, the causal effect was estimated as 144.85, with a 95% confidence interval of 102 to 189. This means that in stability testing, 95% of the results on different subsamples fell in this range. Thus, if real-world conditions are statistically similar to our sample data, and we have modelled all relevant variables in our Causal Diagram, we can expect a result within this range.
  • The second bullet point tries to explain the implications of the result. Note that the units of the result are in whatever the original outcome units were (in this case, weekly wages in US Dollars).
    • Note: In other Studies, if the outcome is binary, the units will be a probability.
  • Third, Causal Wizard warns us that some validation tests failed. Causal Wizard is quite sensitive and it is common for some of these tests to fail. We should look especially closely at those validation results.

The next part of the results presents the estimated effect graphically:

result plotOur two cohorts (groups of people) are presented side by side. This is known as a "Violin plot", because they often appear violin-shaped. You can see that the distribution of weekly wages for Treated is different and higher than for Controls. 

The effect of the treatment (obtaining more than 12 years of education) is presented as the separation of two horizontal lines. The orange line is the mean wage of the Control cohort, and the dashed blue line is the mean wage of the Control cohort plus the estimated effect. This represents a Counterfactual outcome - the shift in this group's mean weekly wage if they had all had more education.

We can see that this shift would make the Control and Treated cohorts roughly equal in weekly wages, which seems sensible given the Causal Diagram we have provided.

Next, let's look at the validation results. They are presented in a table:

As noted above, not all tests passed. 

  • The bootstrap test examined the stability of the effect with different subsamples of the data. This test was passed, meaning the effect is strong and stable as long as our data is a representative sample.
  • The placebo treatment test was failed. We wanted the effect to go to zero; however, it changed from +144 to -3, which means it became close to zero. So perhaps this result is acceptable, or we should revisit our assumptions.
  • The third validation test added a random common cause of both variables, to see if the effect would remain unchanged. It changed from 144.85 to 144.97, so this is nearly unchanged. Again, we can use our own judgement about whether to accept this result.

These tests are sometimes quite sensitive when used with smaller sample sizes (e.g. < 1000).

The remaining parts of the Result explain the assumptions used, the handling and modelling process, and document the Causal Diagram you provided.

Step 6: What if we didn't consider intelligence?

Let's explore what would have happened if we didn't consider intelligence - perhaps because we wonder whether it really does cause an increase in education, or wages. We now think it's irrelevant to everything.

It's often useful to model several different Causal Diagram to answer questions about possible confounding. We could do this as a separate Study, or by modifying this one and generating a new Result. Let's do the latter.

  1. Clicking on Studies in the menu bar.
  2. In the Studies list page, click the Edit button next to your "Tutorial 1" Study.
  3. You'll now be back in the Wizard editor for the Study, and see the Treatment, Outcome variables and Causal Diagram.

Now let's tell Causal Wizard we no longer think Intelligence affects anything.

  1. Above the Causal Diagram editor, click Draw Edges.
  2. Click on the edge between IQ and Education to delete it.
  3. Click on the edge between IQ and Weekly Wage to delete it.
  4. Click the Check button. There should be no issues ...
  5. Ensure the Backdoor: Linear Regression model is selected as before.
  6. Click the Calculate button, and wait for your new Result to pop up.
  7. Click View Result in the Popup.

Modified causal diagram

We found that by removing these edges, the estimated effect of education on wages increased from 144 to 229. Which result is more correct? The answer depends on your prior assumptions, encoded in the causal diagram.

This shows how important your domain knowledge is, in the form of the causal diagram, selection of data, and in the interpretation of the results. While Causal Wizard will calculate these effects for you, your judgement and review is even more important. Review the results closely, and critically. Feel free to contact us if you want to discuss one of your results.

Tip: You may also want to experiment with removing only the edge between IQ and Education, and allowing IQ to still affect wage. What happens to the estimated effect of education on wages?