Testing on Facebook Ads

If you're using Facebook Ads, there's a reason that you're using it, and that's to get results. We'd all love to get great results straightaway, but deep down we know that we're not going to get them overnight. Improving results is a slow, iterative process, and it relies on testing.

Put abstractly, testing is the process of comparing results from your existing setup (called a control) against an alternate setup (called your experiment).

As marketers we try to think of experiments that will perform better than our controls. If we find that our experiment does outperform the control, we adopt the experiment and it effectively becomes the new control. We then think of new tests (new experiments) and use the results from these to continually update our setup (our control).

If the above sounds a little too abstract; don't worry, I'll break it down further. In this article we're going to cover what sort of setups we might want to test within Facebook Ads, and the often overlooked question of how to run tests in Facebook Ads.

What sort of things can we test?

The key to testing is to realise that you can test anything that you can . If you can change a certain setting, or image, or ad text, then it can (and should!) be tested.

Facebook Ads allows you to change a huge range of things though, but this doesn't mean you should try to test everything at once. Some changes are likely to be more impactful than others, and its our job as advertisers to go in with an idea of what changes (and therefore what tests) are likely to be the most impactful.

Some examples of things that you can change, but shouldn't necessarily test are.

Minor changes in age targeting
If your ad sets are targeting users between 18-40, I generally wouldn't bother testing whether increasing the upper limit to 45 helps performance. This is a small change, so it'll hard to see its impact reflected in test results, and Facebook will optimise based on age anyway; if your targeting goes up to 45 then Facebook will only spend on users aged 40-45 if they help performance).
Changes in your ad copy that don't affect overall messaging
‍Things like whether you headlines are capitalised, or whether you finish your ad text with a full stop or exclamation mark, aren't worth testing. Again their impacts are likely to be so small that (unless you have huge budgets to play with) you won't see reliable results.

Ok, so we have some idea of what not to test. What should we be testing then? The possibilities are endless, but here are a few ideas for good things to test:

The audiences you target. You might already be running a 0-1% lookalike audience of some form, but will likely have asked yourself whether it's worth branching out to 2%, 3%, or more. By testing broader lookalike sizes, you can learn which you should be putting your spend into.
Significantly different messaging styles in your ad copy. We said above that you shouldn't waste your time testing capitalisation, but that doesn't mean you shouldn't test vastly different messages. Ad copy that's review-based, offer-based, brand-based are all worth testing, to see which your audience responds to best.
What sort of campaign to use. Do you use conversion campaigns, traffic campaigns, or something entirely different?

Each of these are good, valid tests to run. The control in each example above is sufficiently different from the experiment that you can expect to get significant results without a huge testing budget.

How do you run these tests?

It's all well and good knowing what you want to test, but if you don't know how to run these tests fairly then you might as well not run the test in the first place.

To understand the importance of running fair tests, let's look at some ways not to run tests. In particular, let's consider the lookalike size test I gave above, and look at ways that you could, but shouldn't, run this test.

A fairly tempting way to test whether you should add a large lookalike audience to your targeting is, if you already have an existing lookalike ad set in your campaigns, to just add another lookalike ad set with broader targeting.

The idea is that you can let both run for some time, and if the broader lookalike brings in good results, then you can either keep it in its own ad set, or merge it with your pre-existing lookalike ad set.

The issue with this though, especially if you're using budget-based (i.e. lowest cost) bidding, is that there's no control in this test. You're comparing how a broader lookalike does versus a narrow lookalike.

But that's not the point of the test. The point of the test isn't to see how these audiences compare against one another; the point of the test is to see whether adding a broader lookalike ad set helps overall performance. The setup above can't give us any results for this test, because it doesn't include a control.

To re-structure this test so that it includes a control, we need to start again. We keep our existing campaign with the lookalike ad set as it is, and that becomes our control. Next we duplicate this campaign, and in the duplicate campaign we add the broader lookalike ad set.

We now have two campaigns who differ only by whether they contain a broader lookalike ad set. The one that doesn't is our control, and the one that does is our experiment.

Again it's tempting to think that we can just set both of these campaigns live, let them compete against each other and see who wins. Again though, there's an issue with this.

If you set both campaigns live, they'll compete with each other on the narrow lookalike audience. This is because both campaigns target that audience. When audience members can be targeted by both sides of the test, your test isn't fair.

There are a number of reasons why this is. An example of why is that one campaign can stop the other one from serving ads to particular users. This is because a Facebook page can only show an ad to a user every few hours (how many depends on the placement; e.g. 2 hours for the FB news feed). This means that if one campaign shows an ad to a high-value user, this can block the other campaign from showing ads to that user for a certain period of time.

To stop the two campaigns from competing with one another, you need to divide the audiences between them. This idea is called split testing.

Split testing looks at any overlap in audience that multiple campaigns or ad sets have, and it randomly divides that overlap between the competing campaigns or ad sets. This means that the different legs of your test won't end up competing with one another.

If we ran a split test on the test we've outlined above, we could ensure that our two campaigns didn't compete with each other on the narrow lookalike audience. Running a split test would therefore mean that we can run this test fairly.

Once we've run the test, we can then look at the two campaigns to see what difference having the broader lookalike audience made. If performance is better on the experiment campaign, and the lookalike audience brought in incremental volume, then this is a sign we should keep this campaign, and continue to target a broader lookalike.

If performance is worse on the experiment campaign, perhaps because the cost per result is higher, then we should stick with our control, and abandon the broader lookalike audience.

‍

mack grenfell

forecasting

other writing

Testing on Facebook Ads

What sort of things can we test?

How do you run these tests?

How do you actually run split tests though?

To learn how to build split tests, check out the next article on... you guessed it, split testing.

If you want to be notified the next time I write something, leave your email here. No spam. No sales pitches. Just good advertising stuff every couple of weeks.