Lists to Data.Frames with imap

When working with data which is a result of json-data converted to a list of lists of lists of lists … (you know what mean ;-)) I often want to convert it a data.frame.

Unfortunately there’s often a list in the source data which is unnamed. Or the list in one row is longer than the one in another row. So converting it straight forward into a data.frame or tibble fails with the error message Tibble columns must have compatible sizes.

So what to do? Just leave lists as values in the cells of the data.frame.

Let’s have a look at some sample data:

Sample data

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


options(tidyverse.quiet = TRUE)
library(tidyverse)

row_1 <- list(
  a = 42, 
  b = list("one", "two", "three", "four"),
  c = list("R", "python")
)

row_2 <- list(
  a = 3.14159, 
  b = list("A", "B"),
  c = list("Montana", "Ohio", "California")
)

source <- list(row_1, row_2)

So we have a list source which contains two entries. Both are lists on its own: row_1 and row_2.

Goal

As a result we want to get a data.frame (or tibble):

1
2
3
4
5
6
7


target <- tribble(
  ~a, ~b, ~c,
  42, list("one", "two", "three", "four"), list("R", "python"),
  3.14159, list("A", "B"), list("Montana", "Ohio", "California")
)

target

1
2
3
4
5


## # A tibble: 2 × 3
##       a b          c         
##   <dbl> <list>     <list>    
## 1 42    <list [4]> <list [2]>
## 2  3.14 <list [2]> <list [3]>

purrr::imap

Let’s start with a single row.

The idea is to iterate over each element of the the row_1. So purrr::map* seems to be the function-family of choice. But these functions iterate only over the values of the list. They don’t pass the name of each element.

So we need purrr::imap. This function takes two arguments, the value and the name, and puts them into the processing function:

1
2


row_1 %>% 
  purrr::imap_dfc(~ tibble({{.y}} := list(.x)))

1
2
3
4


## # A tibble: 1 × 3
##   a         b          c         
##   <list>    <list>     <list>    
## 1 <dbl [1]> <list [4]> <list [2]>

Okay, that seems pretty good. But the first column shouldn’t be a list. Here we want a normal column.

1
2


row_1 %>% 
  purrr::imap_dfc(~ tibble({{.y}} := ifelse(length(.x) > 1, list(.x), .x)))

1
2
3
4


## # A tibble: 1 × 3
##       a b          c         
##   <dbl> <list>     <list>    
## 1    42 <list [4]> <list [2]>

That’s really nice. So how do we process the whole list source? We use another instance of purrr::map*.

1
2
3
4
5


result <- source %>% 
  purrr::map_dfr(
    ~.x %>% purrr::imap_dfc(~ tibble({{.y}} := ifelse(length(.x) > 1, list(.x), .x)))
  )
result

1
2
3
4
5


## # A tibble: 2 × 3
##       a b          c         
##   <dbl> <list>     <list>    
## 1 42    <list [4]> <list [2]>
## 2  3.14 <list [2]> <list [3]>

Contents

Sample data

Goal

purrr::imap