The main tool for tracking the action on a website is Google Analytics. But more and more websites switch to other tools such as Matomo or PIWIK PRO due to GDPR.

My employer decided to switch to PIWIK PRO, too. So I was looking for a way to access the data PIWIK PRO was collecting to process it with R. When we used Google Analytics as web analytics tool I used RGoogleAnalytics. I added some enhancements such as caching the data and splitting the requests into daily chunks to handle sampling issues with Google Analytcs.

Unfortunately I haven’t found any R package providing access to PIWIK PRO data. So I wrote my own: piwikproR

Here I want to show you how to use it.

Installation

Currently piwikproR isn’t yet available at CRAN. But using devtools the installation from github is as simple as out of CRAN:

1
devtools::install_github("dfv-ms/piwikproR")

Using piwikproR

Credentials

Before we can use the API of PIWIK PRO we have to generate API credentials. Doing so we get two strings: CLIENT_ID and CLIENT_SECRET.

With these two strings we can generate a token for the actual access. So let’s put the credentials into a list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(piwikproR)

piwik_pro_credentials <- list(
  client_id = "CLIENT_ID",
  client_secret = "CLIENT_SECRET",
  url = "https://my_site.piwik.pro"
 )
 
# Fetch token
token <- get_login_token(piwik_pro_credentials)

Columns

Now let’s define which columns we want to fetch. Therefor we build a tibble containing the column-name and an optional transformation:

1
2
3
4
5
6
 columns <- tibble::tribble(
  ~column, ~transformation,
  "timestamp", "",
  "event_url", "to_path",
  "page_views", "",
)

In the example above we will get the date as the first column, the path-part of each url (instead of https://www.my-domain.com/some/path/to/the/site.html we will get only /some/path/to/the/site.html) and the last column contains the number of page_views. For further details take a look at the documentation at PIWIK PRO.

Filters

As an optional part we can pass a filter to the API-call so the server will do the filtering.

Let’s say we’re only interested in page_views generated by Desktop-devices. So we build the following filter-object:

1
2
3
4
5
filters <- tibble::tribble(
 ~column, ~operator, ~value,
 "device_type", "eq", 0
)
filters <- build_filter(filters, "and")

Adding more lines to filters would add more criteria.

Fetching the data

Now it’s time to fetch the data. We have to choose the date range and the actual website we’re fetching the data for:

1
2
3
4
5
6
7
8
9
website_id <- 'my_website_id'
start.date <- "2021-04-01"
end.date <- "2021-04-30"

query <- build_query(lubridate::ymd(start.date), lubridate::ymd(end.date), website_id,
                    filters = filters,
                    columns, max_lines = 0
)
data <- send_query(query, token, caching = TRUE, fetch_by_day = FALSE)

The result data is a tibble containing the specified columns.

Documentation

PIWIK PRO provides a detailed documentation for their API at https://developers.piwik.pro/en/latest/custom_reports/index.html.