When working with data provided by common APIs you will almost always get in contact with JSON formatted data. Using R’s rjson::fromJSON
will transform JSON to R’s lists.
So far so good. Converting those lists to tibble
using tibble::as_tibble
will fail when the JSON (and therefor the list) contains NULL values. So you havve to replace them before building the tibble.
Example
But let’s start with an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
library(tidyverse)
library(rjson)
data_json <- '
[
{
"name": "Tim",
"age": 34,
"hobby": "footbal"
},
{
"name": "Tom",
"age": 22,
"hobby": "baseball"
},
{
"name": "Shelly",
"age": 21,
"hobby": "cycling"
}
]
'
parsed_json <- rjson::fromJSON(data_json)
parsed_json %>% str()
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
## List of 3
## $ :List of 3
## ..$ name : chr "Tim"
## ..$ age : num 34
## ..$ hobby: chr "footbal"
## $ :List of 3
## ..$ name : chr "Tom"
## ..$ age : num 22
## ..$ hobby: chr "baseball"
## $ :List of 3
## ..$ name : chr "Shelly"
## ..$ age : num 21
## ..$ hobby: chr "cycling"
|
In in this example we parse the JSON using rjson::fromJSON()
into a list of lists. Each sublist contains the value of a person.
We can convert this list of lists into a tibble
with some purrr::map_dfr()
magic:
1
2
|
parsed_json %>%
purrr::map_dfr(~ tibble::as_tibble(.x))
|
1
2
3
4
5
6
|
## # A tibble: 3 × 3
## name age hobby
## <chr> <dbl> <chr>
## 1 Tim 34 footbal
## 2 Tom 22 baseball
## 3 Shelly 21 cycling
|
Example with null-values
But what happens if there are null-values in the JSON file?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
data_json <- '
[
{
"name": "Tim",
"age": 34,
"hobby": "footbal"
},
{
"name": "Tom",
"age": null,
"hobby": "baseball"
},
{
"name": "Shelly",
"age": 21,
"hobby": null
}
]
'
parsed_json <- rjson::fromJSON(data_json)
parsed_json %>% str()
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
## List of 3
## $ :List of 3
## ..$ name : chr "Tim"
## ..$ age : num 34
## ..$ hobby: chr "footbal"
## $ :List of 3
## ..$ name : chr "Tom"
## ..$ age : NULL
## ..$ hobby: chr "baseball"
## $ :List of 3
## ..$ name : chr "Shelly"
## ..$ age : num 21
## ..$ hobby: NULL
|
As you can see the age
of the second entry and the hobby of the third one are null
. They get parsed to NULL
.
These lead to an error when converting the list of list to a tibble:
1
2
|
parsed_json %>%
purrr::map_dfr(~ tibble::as_tibble(.x))
|
1
2
3
4
5
|
## Error in `map()`:
## ℹ In index: 2.
## Caused by error in `tibble::as_tibble()`:
## ! All columns in a tibble must be vectors.
## ✖ Column `age` is NULL.
|
Solution
So I suggest to use a custom function which converts NULL
values to NAs and then converts the entry to a tibble
1
2
3
4
5
6
7
8
|
fix_NULL <- function(variables) {
variables <- variables %>%
map(~ ifelse(is.null(.x), NA, .x))
tibble::as_tibble(variables)
}
parsed_json %>%
purrr::map_dfr(~ fix_NULL(.x))
|
1
2
3
4
5
6
|
## # A tibble: 3 × 3
## name age hobby
## <chr> <dbl> <chr>
## 1 Tim 34 footbal
## 2 Tom NA baseball
## 3 Shelly 21 <NA>
|