Last week I’ve visualized the spread of Covid-19 through Germany using a map-plot.

Now I was asking myself if there’s a better way to show that the rising number of infections these day are other than during the local spreadings in June.

So now I want to plot all of the German Landkreise (similar to counties in the U.S.) regarding the number of new infections during the last 7 days per 100,000 residents using a histogram.

Getting the data

Once again I’m using the data provided by Pavel Mayer.

I’m processing it mostly the same way I did last week.

1
2
3
4
5
6
7
8
# Load Libraries
suppressMessages(library(tidyverse))
suppressMessages(library(lubridate))
suppressMessages(library(zoo))
suppressMessages(library(scales))

# Suppress summarise info
options(dplyr.summarise.inform = FALSE)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Fetch the data
url_landkreise_full <- "https://pavelmayer.de/covid/risks/full-data.csv"

if(file.exists("data_landkreise_detail-histogram.Rda")){
  load("data_landkreise_detail-histogram.Rda")
} else{
  data_landkreise_detail <- read_csv(url_landkreise_full, col_types = cols(
    Bundesland = col_character(),
    Landkreis = col_character(),
    Altersgruppe = col_character(),
    Geschlecht = col_character(),
    IdLandkreis = col_character(),
    Datenstand = col_character(),
    Altersgruppe2 = col_character(),
    LandkreisName = col_character(),
    LandkreisTyp = col_character(),
    NeuerFallKlar = col_character(),
    RefdatumKlar = col_character(),
    MeldedatumKlar = col_character(),
    NeuerTodesfallKlar = col_character(),
    missingSinceDay = col_integer(),
    missingCasesInOldRecord = col_integer(),
    poppedUpOnDay = col_integer()
    )
  )
  save(data_landkreise_detail, file = "data_landkreise_detail-histogram.Rda")
}

# Set locale to German because there's the German notation of the weekday used
Sys.setlocale(category = "LC_ALL", locale = "de_DE.UTF-8")
1
## [1] "de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/C"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# change MeldedatumKlar to date
data_landkreise_detail_converted <- data_landkreise_detail %>%
  mutate(
    MeldedatumKlar = as.Date(
      strptime(MeldedatumKlar, format = "%a, %d.%m.%Y %H:%M")
    )
  )

landkreise <- data_landkreise_detail_converted %>%
  select(Landkreis, IdLandkreis, Bevoelkerung) %>%
  unique()

data_landkreise_per_day <- data_landkreise_detail_converted %>%
  arrange(MeldedatumKlar) %>%
  group_by(Landkreis, MeldedatumKlar) %>%
  summarize(
    infected = sum(AnzahlFall)
  ) %>%
  ungroup() %>%
  complete(MeldedatumKlar = seq(min(MeldedatumKlar), max(MeldedatumKlar), by = "day"), Landkreis, fill = list(infected = 0)) %>%
  arrange(MeldedatumKlar) %>%
  group_by(Landkreis) %>%
  mutate(
    infected_7 = rollsum(infected, 7, fill = NA, align = "right")
  ) %>% 
  left_join(landkreise) %>% 
  mutate(
    infected_7_per_100k = infected_7/ Bevoelkerung * 100000
  ) 
1
## Joining, by = "Landkreis"

As you can see I’m using complete in a slightly different way to ensure to get every date from the very beginning to the last one.

Now let’s prepare the data to fit into a histogram with 0 to 50+ infections on the x-axis. Therefor I set each landkreis with more than 50 infections to 50.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
data_landkreise_cropped <- data_landkreise_per_day %>% 
  filter(MeldedatumKlar >= ymd("2020-03-01")) %>% 
  rowwise() %>% 
  mutate(
    infected_7_per_100k_plot = min(infected_7_per_100k, 50)
  ) %>% 
  ungroup()

get_histogram <- function(data) {
  ggplot(data) +
  geom_histogram(aes(x = infected_7_per_100k_plot, fill = ..x..), bins = 50) +
  scale_fill_gradient2(midpoint = 15, low = "green", mid = "yellow",
                          high = "red", space = "Lab", limits=c(0, 50), oob=squish,
                         name = "Number of infections per 100k (50 is 50+)") +
  guides(fill = FALSE) +
  xlab("New Infections in last 7 days per 100,000 residents") +
  ylab("Number of Landkreise") +
  theme_bw()
}

The result

So let’s compare the June event with the current data:

1
2
get_histogram(data_landkreise_cropped %>% filter(MeldedatumKlar %in% c(ymd("2020-06-23"), ymd("2020-08-14")))) +
  facet_wrap(~MeldedatumKlar) 

As you can see the peak in August is much broader with more Landkreise with more infections. In June on the other hand there were a lot more Landkreise with a few or no infections.

The whole pandemic

Let’s do an animation of the whole pandemic again.

1
2
3
4
5
6
7
library(gganimate)
anim <- get_histogram(data_landkreise_cropped) + 
  transition_manual(MeldedatumKlar) +
  labs(title = "Date: {current_frame}")

animated_histogram <- animate(anim, fps = 5, end_pause = 15)
anim_save("histogram-all-year.gif", animated_histogram)

Note

I’ve got a comment to my last visualization to use different colours such as the viridis scale because red-green colour blindness is very common.

So I tried it here.

1
2
3
get_histogram(data_landkreise_cropped %>% filter(MeldedatumKlar %in% c(ymd("2020-06-23"), ymd("2020-08-14")))) +
  facet_wrap(~MeldedatumKlar) +
   scale_fill_viridis_c()
1
2
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.

But I think in this special case green - yellow - red is the better choice because even if you can’t distinghuish between them the position in the plot will be enough to get the right value.

However, the maps of last week would be better with viridis.