There are several ways for A/B-Testing with Google Analytics. First you can use the Google way. Google calls it [Content Experiments](https://support.google.com/analytics/answer/1745147?hl=en. But you have to define a goal when you set up an experiment.

If you want to investigate several metrics I thinks the better way is to do the analysis on your own. Use Google Analytics to create the data. But do the analysis on your own.

But first I’d like to summarize what I’d like to do:

Say, you have two different versions of your website, e.g. one is simply black and white, the other one is very colourful. Or, the first one uses one font the other one uses another one. When a user visits the website for the first time, your webserver flips a virtual coin and delivers the website with feature A or feature B. After that each request made in this session gets the same version. Then you may ask “Which version does have the lower bounce rate?” or “On which version does the user stay longer?”.

Setup in Google Analytics

First you have to create a custom dimension to distinguish users shown the different versions. You do this in the “Admin” menu of your Google Analytics account:

custom dimensions

Setting up a custion dimension

Here I name my custom dimension “test-version”. It’s my first custom dimension:

custom dimension

The scope of your new custon dimension describes the duration a value of this dimension. You can set it to hit (just for this page), to session (for the whole session) or to user. We need here a session-based dimension. (See Google’s docs for detailed information.)

You can define up to 20 custom dimensions. (If you’d like to use more you have to buy Google Analytics premium.)

Delivering the data

After creating the new custom dimension you have to provide the correct value every time a user hits your page. Just put the line

1
 ga('set', 'dimension1', dimensionValue);

into the analytics tracking code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<script>
 (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
 (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
 m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
 })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

 ga('create', 'UA-1234567890-1', 'auto');
 ga('set', 'dimension1', dimensionValue);
 ga('send', 'pageview');

</script>

“dimensionValue” is either “versionA” / “versionB” or “yes” / “no”. You have to set it according to the version of the website user gets delivered.

Analysis

Using RGoogleAnalytics for retrieving the data you can analyse the different versions of your website.

With the following query you can get various metrics split by day and our custom dimension:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
query.list <- Init(start.date = start.date,
                   end.date = end.date,
                   dimensions = "ga:date,ga:dimension1",
                   metrics = "ga:sessions,ga:bounces,ga:bounceRate,ga:pageviews,ga:pageviewsPerSession,ga:sessionDuration,ga:avgSessionDuration,ga:avgPageLoadTime",
                   max.results = 10000,
                   table.id = table.id,
                   caching.dir = "cache",
                   caching = cache)
 
  ga.query <- QueryBuilder(query.list)
  data.perDay <- GetReportData(ga.query, token, split_daywise = TRUE, delay = 0)

Plotting the data with ggplot2

1
ggplot(data = data.perDay, aes(x=as.Date(date, format="%Y%m%d"), y=bounceRate, color=dimension1)) + geom_line()

bounceRate vs. time and version

But what about statistical significance?

So let’s get the data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
query.list <- Init(start.date = start.date,
                   end.date = end.date,
                   dimensions = "ga:dimension1",
                   metrics = "ga:sessions,ga:bounces",
                   max.results = 10000,
                   table.id = table.id,
                   caching.dir = "cache",
                   caching = cache)
 
  ga.query <- QueryBuilder(query.list)
  data <- GetReportData(ga.query, token, split_daywise = FALSE, delay = 0

With this data

1
2
3
  dimension1 sessions bounces 
1  version A    684   359 
2  version B    678   394

we can perform a chi-squared-test:

1
2
3
4
data %>%
    mutate(noBounces = sessions - bounces) %>%
    select(noBounces, bounces) %>%
    chisq.test()

The output shows that the result is significant:

1
2
3
4
	Pearson's Chi-squared test with Yates' continuity correction

data:  .
X-squared = 4.1361, df = 1, p-value = 0.04198

For a more complex scenario see part 2