Tutorial 2

CategoriesTutorial

Does headline-length affect Click-Through Rate (CTR)?

Introduction

Before making changes to optimize websites or content to increase engagement, it's important to really understand what is driving user behaviour. This tutorial is derived from an article by Adam Kelleher. Adam critically examined the data behind another blog article which claimed that a headline-length of 16-18 words maximized engagement, and noticed that what was actually happening was that a few popular authors tended to pick longer headlines, yet long headlines didn't necessarily help other authors.

Using Causal Inference we can repeat the experiment and examine the effect of headline-length on Click-Through Rate when controlling for Author.

We will do this in two stages. 

First, we'll model the effect of headline-length on click through rate without considering authorship, and see what happens. We expect to see that longer headlines lead to higher click-through rates. 

But wait!

What Adam showed was that when accounting for authorship, the effect disappeared. In this data, longer headlines don't in fact cause higher click through rates. Better authors cause higher click through rates!

So, we'll modify the Study to reflect that and see what happens to the results. We expect to see the (false) effect of headline-length disappear.

Step 1: Create a new Study

  1. Click on Studies in the menu bar.
  2. In the Studies list page, click the Create button.
  3. Give the new Study any name you like (e.g. Tutorial 2).
  4. Select the Dataset with the name beginning "Tutorial 2"
  5. Click Create and the new Study will be displayed.

Step 2: Specify Treatment and Outcome

We want to see whether longer titles increase click-through rate. So, select variable title_length as the Treatment and select click_through_rate as the Outcome.

Adam's model says that a "long" title is one with more than 10 words. So, click the Threshold value input to open the threshold dialog. 

Type 10 into the input, and press ENTER, then click Save. The dialog will close.

Step 3: Draw Causal Diagram

The next step is to tell Causal Wizard how these variables interact. Following the original case study, we assume that:

  • title_length affects click_through_rate directly
  • author affects both title_length and click_through_rate

Adam's case study originally didn't consider author, leading them to the claim that longer headlines lead to 

Click and add the 3 nodes to the graph: 

  1. title_length
  2. author
  3. click_through_rate

Add an edge:

  • from title_length to click_through_rate

The graph should look like this:

Step 4: Check the setup

  1. Press the Check button at the top of the page. 
  2. If something is not right, you'll get some warning messages - follow their instructions to complete or fix the setup.
  3. If everything went as planned, you'll see a large green block which invites you to select a Model type. Select "Backdoor: Linear regression". (This tutorial won't cover model selection criteria.)
  4. Press the Calculate button below the green block.
  5. Click confirm.

Step 5: Reviewing the results

You can always come back to earlier results. If you decided to come back later while following this tutorial, you can find your result by:

  1. Clicking on Studies in the menu bar.
  2. In the Studies list page, click the Results button next to your Tutorial 2 Study.
  3. You'll now see all Results for this Study. If you've followed this tutorial, you'll only have one. Click the Open button.

The key finding is that we have found title length greater than 10 has an effect of +0.01 on click through rate.  While that might not seem like much, the average click through rate is 0.05 so +0.01 is a 20% increase!

The problem is that as Adam pointed out, the effect of authorship is being ignored. Let's go back to the Study and add the missing edges.

Step 6: Modifying the Causal Diagram

Click "Edit Study" (or switch tabs if the Study is already open).

In the Study editor page, click the Draw Edges button to modify the causal diagram.

Add edges:

  • from author to title_length
  • from author to click_through_rate

There should now be 3 edges in total, like this:

Next:

  1. Press the Check button at the top of the page. 
  2. If something is not right, you'll get some warning messages - follow their instructions to complete or fix the setup.
  3. If everything went as planned, you'll see a large green block which invites you to select a Model type. Select "Backdoor: Linear regression". (This tutorial won't cover model selection criteria.)
  4. Press the Calculate button below the green block.
  5. Click confirm.

Step 7: Review the updated results

When available, navigate to the new result.

You should now notice that the causal effect has disappeared. The new estimate, controlling for authorship, is 0.

Notice also that the statistical significance test is now failed, suggesting the results are no better than chance.

The key takeaway here is that properly accounting for a confounding variable (authorship) eliminated what originally appeared to be a promising insight. If we had just done a simple association or non-causal analysis of the title length and click through rate, we might right now be "optimizing" our web content in completely the wrong way...