2013 The Humane League/Farm Sanctuary Video vs. Ad Study
Please note that this analysis is archived, as it was published in 2013, and is not up to our current standards.
This is the Animal Charity Evaluators statistical analysis of the publicly available data from a study on ad comparison performed by The Humane League and Farm Sanctuary. For more context, see ACE’s narrative analysis of this and a similar study. This document is intended to be read as a supplement to ACE’s analysis.
The code that follows is the R code used in our analysis. If you would like a copy of the data used to replicate or extend our analysis, please contact us.
vidclicks sum(Videos$GAClicks[which(Videos$Video == vidname)]) } adclicks sum(Videos$GAClicks[which(Videos$Ad == adname)]) } vidorders sum(Videos$MFGC[which(Videos$Video == vidname)]) } adorders sum(Videos$MFGC[which(Videos$Ad == adname)]) }
Videos
To get a sense of the variation in effectiveness between videos, we consider the total order rate for each video, or alternatively its reciprocal, the number of clicks required, on average, to produce one literature order.
vidorders("FTF")/vidclicks("FTF") # or vidclicks('FTF')/vidorders('FTF') ## [1] 0.01529 vidorders("WCB")/vidclicks("WCB") # or vidclicks('WCB')/vidorders('WCB') ## [1] 0.02705 vidorders("TBL")/vidclicks("TBL") # or vidclicks('TBL')/vidorders('TBL') ## [1] 0.02373 vidorders("MYM")/vidclicks("MYM") # or vidclicks('MYM')/vidorders('MYM') ## [1] 0.02023
We test whether differences are significant using a chi-square test. We make a table comparing clicks that did not result in orders and orders for two videos, and test how likely the table would be to result if the distribution of clicks that result in orders was the same for each video.
Farm to Fridge | What Came Before | |
clicks – orders | 14042 | 61569 |
orders | 218 | 1712 |
Since we’ll be running 10 significance tests in this analysis (six pairwise comparisons of videos and four tests on various groupings of ads), we’ll keep in mind that the conservative Bonferroni correction for multiple significance tests suggests that whatever p value we would take as indicating significance for a single test, we should require one of our multiple tests have a p value 1/10 that or lower to conclude that the result is significant. For instance, if in running a single comparison we would look for p < .01, we should now look for p < .001.
click.mat vidorders("WCB")), c(vidorders("FTF"), vidorders("WCB"))) chisq.test(click.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: click.mat ## X-squared = 65.9, df = 1, p-value = 4.741e-16
Given the very low p value, the difference between these two videos is extremely unlikely to be due to chance alone. Comparisons are a bit more complicated between What Came Before and Ten Billion Lives or Meet Your Meat, since we should only use data from ads that had versions for both videos involved.
someclicks sum(Videos$GAClicks[which(Videos$Video == vidname & (Videos$Ad == "phoebe" | Videos$Ad == "serj"))]) } someorders sum(Videos$MFGC[which(Videos$Video == vidname & (Videos$Ad == "phoebe" | Videos$Ad == "serj"))]) } click.mat vidorders("TBL")), c(someorders("WCB"), vidorders("TBL"))) chisq.test(click.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: click.mat ## X-squared = 9.277, df = 1, p-value = 0.002321 click.mat vidorders("MYM")), c(someorders("WCB"), vidorders("MYM"))) chisq.test(click.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: click.mat ## X-squared = 6.752, df = 1, p-value = 0.009366
With the Bonferroni correction in mind, these results are not significant at the overall level of p < .01.
The other three possible pairwise tests give less significant results:
click.mat vidorders("TBL")), c(someorders("FTF"), vidorders("TBL"))) chisq.test(click.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: click.mat ## X-squared = 3.565, df = 1, p-value = 0.059 click.mat vidorders("MYM")), c(someorders("FTF"), vidorders("MYM"))) chisq.test(click.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: click.mat ## X-squared = 1.074, df = 1, p-value = 0.3001 click.mat vidorders("MYM")), c(vidorders("TBL"), vidorders("MYM"))) chisq.test(click.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: click.mat ## X-squared = 0.2223, df = 1, p-value = 0.6373
Ads
Since we do not expect the same ads to be used repeatedly over time and we do not have detailed information about the characteristics of the ads, pairwise comparisons of their effectiveness would be more than we need. Instead, we are interested in the overall amount of variation in effectiveness between ads, and in variation in effectiveness due to characteristics of the ads that are known, in this case whether they were shown on Facebook or on blogs.
clicks2 orders2 test.mat chisq.test(test.mat) ## Warning: Chi-squared approximation may be incorrect ## ## Pearson's Chi-squared test ## ## data: test.mat ## X-squared = 87, df = 14, p-value = 1.394e-12
We got a warning that the chi-square test may not be reliable in this case, probably because the expected values for too many cells are too small. We’ll try running the test on the two subgroups of ads (Facebook ads and BlogAds ads) in hopes that one may have large enough expected values for the test to be reliable.
clicksfb ordersfb fb.mat chisq.test(fb.mat) ## ## Pearson's Chi-squared test ## ## data: fb.mat ## X-squared = 35.44, df = 5, p-value = 1.228e-06
The Facebook ads were each run often enough to make the chi-square test appropriate for this data, and the variation between ads run on Facebook appears to be significant. We don’t know from this whether there are ads of unusually high effectiveness, unusually low effectiveness, or both, but we do know that effectiveness varies in some way.
clicksbl ordersbl bl.mat chisq.test(bl.mat) ## Warning: Chi-squared approximation may be incorrect ## ## Pearson's Chi-squared test ## ## data: bl.mat ## X-squared = 6.032, df = 8, p-value = 0.6437
The warning about the reliability of the chi-square test returns when we look at the group of ads run through BlogAds, since it contains the ads which were clicked fewer times and which led to the larger group comparison being unreliable in the first place.
Now we will test for variation between the two groups of ads we just considered, to see whether the choice of system for ad placement affects the average effectiveness of the ad, and thus what we should be willing to pay per click.
fbclicks blclicks fborders blorders clicks4 orders4 test4.mat chisq.test(test4.mat) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: test4.mat ## X-squared = 45.44, df = 1, p-value = 1.57e-11
There are differences in effectiveness between the ads shown on Facebook and the ads shown through BlogAds. Since the differences do not seem to be attributable to chance and the study did not report any systematic difference in the type of ads shown on the two platforms, it appears that the platform influences the effectiveness of the ad.