Everyone in Germany is speaking about the 2nd wave of Covid-19. Has it already
arrived, is it still coming?
The number of new Covid-19 cases is rising for quite a few weeks According to
Johns Hopkins University there were more
than 1,000 new cases per day in the last days.
That was also the case when there was only a local hotspot in Gütersloh and Warendorf occurred.
But this time there isn’t one hotspot. The infected people live all over
Germany.
So I want to visualize the spreading.
The data
Pavel Mayer produces every day a very detailed table with Covid-19 data per Landkreis.
(Landkreis is an sub-area of a Bundesland such as a county of a state in the U.S.)
He uses data provided by German Robert-Koch-Institut (RKI)
and adds some data from other sources and tries to clean up the data.
He offers the data for download, too.
I’m using this data for my visualization. I can’t get exactly the same numbers
as Pavel does. I’m not sure what the reason is for the difference. As I’m using
the date a case was reported. Maybe he uses the date of infection.
Update 2020-08-14: The reason for the difference between my calculated values and
Pavel’s is a different date: I’m calculating the number of infections which are
detected on a special day (MeldedatumKlar). Pavel refers his calculated infections
to the day they are reported by the RKI (Datenstand). Note: There may be a delay of several days
between testing and getting the positive result to the local health department on
one hand and sending this data to the RKI. They still use FAX in 2020!
So let’s start:
1
2
3
4
5
6
7
|
# Load Libraries
suppressMessages(library(tidyverse))
suppressMessages(library(lubridate))
suppressMessages(library(zoo))
# Suppress summarise info
options(dplyr.summarise.inform = FALSE)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
# Fetch the data
url_landkreise_full <- "https://pavelmayer.de/covid/risks/full-data.csv"
if(file.exists("data_landkreise_detail.Rda")){
load("data_landkreise_detail.Rda")
} else{
data_landkreise_detail <- read_csv(url_landkreise_full, col_types = cols(
Bundesland = col_character(),
Landkreis = col_character(),
Altersgruppe = col_character(),
Geschlecht = col_character(),
IdLandkreis = col_character(),
Datenstand = col_character(),
Altersgruppe2 = col_character(),
LandkreisName = col_character(),
LandkreisTyp = col_character(),
NeuerFallKlar = col_character(),
RefdatumKlar = col_character(),
MeldedatumKlar = col_character(),
NeuerTodesfallKlar = col_character(),
missingSinceDay = col_integer(),
missingCasesInOldRecord = col_integer(),
poppedUpOnDay = col_integer()
)
)
save(data_landkreise_detail, file = "data_landkreise_detail.Rda")
}
|
The date when an infection is reported is coded in MeldedatumKlar
. So let’s
transform it to a date column:
1
2
|
# Set locale to German because there's the German notation of the weekday used
Sys.setlocale(category = "LC_ALL", locale = "de_DE.UTF-8")
|
1
|
## [1] "de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/C"
|
1
2
3
4
5
6
7
|
# change MeldedatumKlar to date
data_landkreise_detail_converted <- data_landkreise_detail %>%
mutate(
MeldedatumKlar = as.Date(
strptime(MeldedatumKlar, format = "%a, %d.%m.%Y %H:%M")
)
)
|
We want to visualize the infections during the last 7 days. So let’s compute them.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
data_landkreise_per_day <- data_landkreise_detail_converted %>%
arrange(MeldedatumKlar) %>%
group_by(Landkreis, MeldedatumKlar) %>%
summarize(
infected = sum(AnzahlFall)
) %>%
ungroup() %>%
complete(MeldedatumKlar, Landkreis, fill = list(infected = 0)) %>%
arrange(MeldedatumKlar) %>%
group_by(Landkreis) %>%
mutate(
infected_7 = rollsum(infected, 7, fill = NA, align = "right")
)
|
In Germany there is a rule that each Landkreis with more than 50 newly infected
people per 100,000 residents during the last 7 days must take care of the situation such as restrictions
for meeting other people or closing parks, bars and restaurants etc. So we need the number of residents
per Landkreis.
Pavel provides this data already in his dataset. He also provides an id (IdLandkreis
) for each
landkreis we will need later. So let’s extract these two fields out of the
original data and merge them into the summarized data.frame:
1
2
3
4
5
6
7
8
9
|
landkreise <- data_landkreise_detail_converted %>%
select(Landkreis, IdLandkreis, Bevoelkerung) %>%
unique()
data_landkreise_per_day_per_100k <- data_landkreise_per_day %>%
left_join(landkreise) %>%
mutate(
infected_7_per_100k = infected_7/ Bevoelkerung * 100000
)
|
1
|
## Joining, by = "Landkreis"
|
A first map
So now we want to plot this data into a map.
ggplot2 provides some maps such as map_data("world")
. Unfortunatly German Landkreise
aren’t (yet) available.
So we need to fecth them on our own:
1
2
3
4
5
6
|
# Don't load the whole raster library because it overwrites the select method from dplyr.
# library(raster)
suppressMessages(library(rgeos))
suppressMessages(library(maptools))
suppressMessages(library(scales))
suppressMessages(library(ggmap))
|
1
2
3
4
5
6
7
|
# Fetch data for Landkreise
landkreise.shp <- raster::getData("GADM", country = "DEU", level = 2)
landkreise.shp.f <- fortify(landkreise.shp, region = "CC_2")
landkreise.shp.f <- landkreise.shp.f
# Fetch data for Bundesländer
laender.shp <- raster::getData("GADM", country = "DEU", level = 1)
|
Now we need to subset our data to a specific date. Let’s have a look at 2020-03-27:
1
2
3
4
5
6
7
8
|
plot_date <- ymd("2020-03-27")
mydata <- data_landkreise_per_day_per_100k %>%
filter(MeldedatumKlar == plot_date) %>%
rename(id = IdLandkreis)
## merge shape file with data
merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id")
|
Now we can plot the data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
single_map <- function(merge.shp.coef, landkreise.shp.f){
ggplot() +
geom_polygon(data = merge.shp.coef, aes(x = long, y = lat, group = group,
fill = infected_7_per_100k), color = "black", size = 0.25) +
geom_polygon(data = laender.shp, aes(x = long, y = lat, group = group),
alpha = 0, color = "black", size = 1.0) +
coord_map() +
scale_fill_gradient2(midpoint = 15, low = "green", mid = "yellow",
high = "red", space = "Lab", limits=c(0, 50), oob=squish,
name = "Number of infections per 100k (50 is 50+)") +
theme_nothing(legend = TRUE) +
labs(
title = paste0("Infections per 100.000 Residents during last 7 days at ",
format(plot_date, format = "%m/%d/%Y"))
)
}
|
1
|
single_map(merge.shp.coef, landkreise.shp.f)
|
1
|
## Regions defined for each Polygons
|
Adjustments for Berlin and Göttingen
Wow, great. But if we take a closer look we see three gray parts:
First, Berlin is gray. The reason is that in Pavel’s data Berlin is split into
several parts. So we have to combine them for plotting.
The other two gray parts are Göttingen in Lower Saxony. I’m not sure were the error is.
Wikipedia lists 03159
as id for Landkreis Göttingen as Pavel does.
R fetches two Landkreise: 03152
and 03156
(Göttingen and Osterorde am Harz
https://gadm.org/maps/DEU/niedersachsen_2.html).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
# Fix for Göttingen
landkreise.shp.f <- landkreise.shp.f %>%
mutate(
id = if_else(id == "03152", "03159", id), # fix for Göttingen
id = if_else(id == "03156", "03159", id) # fix for Göttingen
)
# Fix for Berlin
data_landkreise_per_day_per_100k <- data_landkreise_per_day_per_100k %>%
mutate(
Landkreis = if_else(grepl("Berlin", Landkreis), "Berlin", Landkreis)
) %>%
group_by(MeldedatumKlar, Landkreis) %>%
summarize(
infected = sum(infected, na.rm = TRUE),
infected_7 = sum(infected_7, na.rm = TRUE),
Bevoelkerung = sum(Bevoelkerung),
IdLandkreis = first(IdLandkreis)
) %>%
mutate(
IdLandkreis = if_else(Landkreis == "Berlin", "11000", IdLandkreis),
infected_7_per_100k = infected_7/ Bevoelkerung * 100000
)
# Merge data again
plot_date <- ymd("2020-03-27")
mydata <- data_landkreise_per_day_per_100k %>%
filter(MeldedatumKlar == plot_date) %>%
rename(id = IdLandkreis)
## merge shape file with data
merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id")
|
1
|
single_map(merge.shp.coef, landkreise.shp.f)
|
1
|
## Regions defined for each Polygons
|
Okay, no more blind spots.
Animation for the last 4 days
Let’s use gganimate
to plot different days into one map and generate an animated
gif.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
library(gganimate)
plot_date <- max(data_landkreise_per_day_per_100k$MeldedatumKlar)
days_of_animation <- 3
mydata <- data_landkreise_per_day_per_100k %>%
filter(MeldedatumKlar >= plot_date - days(days_of_animation)) %>%
rename(id = IdLandkreis)
## merge shape file with data
merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id")
animated_map <- single_map(merge.shp.coef, landkreise.shp.f) +
transition_manual(MeldedatumKlar) +
labs(title = "Date: {current_frame}")
|
1
|
## Regions defined for each Polygons
|
1
|
anim_save("map-3-days.gif", animated_map)
|
1
|
## nframes and fps adjusted to match transition
|
Animate the whole pandemic
The more days are animated the more memory is used to compute the gif. I tried
to animate all data since the start of the pandemic but my 16G of memory weren’t
enough.
So I used only every forth day and waited several minutes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
library(gganimate)
plot_dates <- seq(min(data_landkreise_per_day_per_100k$MeldedatumKlar), max(data_landkreise_per_day_per_100k$MeldedatumKlar), by = 4)
mydata <- data_landkreise_per_day_per_100k %>%
filter(MeldedatumKlar %in% plot_dates) %>%
rename(id = IdLandkreis)
## merge shape file with data
merge.shp.coef <- full_join(landkreise.shp.f, mydata, by = "id")
animated_map <- single_map(merge.shp.coef, landkreise.shp.f) +
transition_manual(MeldedatumKlar) +
labs(title = "Date: {current_frame}")
|
1
|
## Regions defined for each Polygons
|
1
|
# anim_save("map-all-year.gif", animated_map)
|
Credits
First I’d like to thank Pavel for his work he did creating his daily table.
I learned much about creating maps with ggplot2 following the steps of Thilo Klein’s page