Finding and Using the Most Important Factors in a Campaign with the Help of Data Science

Alice BezettData ScientistMore about Alice

Alice Bezett is one of our resident data scientists here at Bannerconnect. She digs deep into our vast quantities of first, second and third party data for the insights that give our clients’ programmatic advertising the edge to stand out from the crowd. She comes to us from a background in finance and academia, but has found her passion for statistics renewed in programmatic. Read more from Alice on how a career in programmatic has strengthened her love of statistics here.

The problem of optimizing an online campaign is incredibly complex – hence why we have so many campaign managers in our organization who are dedicated to the cause! Our campaign managers make sure to choose the best impression, for the best price, but given the volume of work they have, we in the data science department were wondering, is there a way for us to help out?

One of the reasons optimization is so difficult is due to all the different variables present. For example, a campaign manager will regularly optimize not just on placement, domain, and size of creative, but also on device type, creative type, operating system, browser, …. And the list goes on! The complexity comes from not just the number of different variables, but also the fact that so many of them interact with each other – for example, different sizes may perform very well on some domains, but very poorly on others. Ideally, we’d like to take all these interactions into account, but how should we go about doing this?

This is where maths can come to our rescue! Logistic regression can be used to create a model which gives insight into how the different variables contribute to the success of a campaign (whether this is CTR, CPA, or search conversions). Details on this method can be found here.

While a full explanation of this type of regression is beyond the scope of this blog, we can still explore why we should use this model. Logistic regression can be used when the outcome you’re trying to predict has only two possibilities: such as alive/dead, sick/healthy, click/don’t click, buy/don’t buy – and so this is the method we will use to model our situation. The model can help tell us the way in which the different variables in our campaign may affect the possibility of a click or conversion.

Once we have made a model, what is the next step? Given the vast number of variables that are present, it would be great if we could decide which are the most important factors in our optimization process, i.e. how do we isolate the variables that are making the biggest contribution to a campaign’s success or failure? If we knew this, we could focus only on the most important variables, and forget the rest, thereby simplifying our optimization process.

Unfortunately, this is easier said than done and is mathematically quite complex. Luckily for us though, there is a large body of research on this topic in the field of statistics, and we include some examples at the foot of this article. These articles give us several methods which we can use to calculate the relative importance of different factors in our model. In a nutshell, we can calculate which variables have enough data to be statistically significant, and we can also calculate which variables are associated with the largest (or smallest!) uplift in our campaign.

Once we have this knowledge, we can either automate the process of optimizing on these variables or allow the campaign managers to do it manually. For example, if we find that the size of the creative is creating the biggest difference between success and failure, we can make sure that we buy more of the best sizes, and less (or none) of the worse ones. This ensures we make the biggest wins, for the least effort – making both our data and operations teams very happy!

Related content