I sometimes have a function which takes some parameters and returns a data.frame
as a result. Then I have a data.frame where
each row of it is a set of parameters. So I like to apply the function to each
row of the parameter-data.frame and rbind the resulting data.frames.
There are several ways to do it. Let’s have a look:
The function …
So let’s build a simple function we can use
1
2
3
4
5
6
7
8
9
10
11
|
my_function <- function(repeated = 1, text = "a", number_rows = 2) {
row <- data.frame(
`repeated` = repeated,
`text` = text,
`number_rows` = number_rows,
generated_text = paste(replicate(repeated, text), collapse = "")
)
return(do.call("rbind", replicate(number_rows, row, simplify = FALSE)))
}
my_function(3, "Hello ", 4)
|
1
2
3
4
5
|
## repeated text number_rows generated_text
## 1 3 Hello 4 Hello Hello Hello
## 2 3 Hello 4 Hello Hello Hello
## 3 3 Hello 4 Hello Hello Hello
## 4 3 Hello 4 Hello Hello Hello
|
So this function takes three arguments and returns a data.frame
. The length
of the data.frame depends on the last parameter.
… and its parameters
So now we have several tuples of paramters. Each tuple is a row of our parameter-data.frame:
1
2
3
4
5
6
7
8
9
10
11
|
options(tidyverse.quiet = TRUE)
library(tidyverse, warn.conflicts = FALSE)
parameters <- tribble(
~repeated, ~text, ~number_rows,
1, "one", 3,
2, "two", 2,
3, "three", 1
) %>%
as.data.frame()
parameters
|
1
2
3
4
|
## repeated text number_rows
## 1 1 one 3
## 2 2 two 2
## 3 3 three 1
|
So now we want to apply our function three times, one time for each row of the
data.frame parameters
.
Iterating with …
There are several ways to interate.
… a for-loop
The most common way in programming is a for-loop:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# initialize result
result <- data.frame(
repeated = numeric(0),
text = character(0),
number_rows = numeric(0),
generated_text = character(0),
stringsAsFactors = FALSE
)
for (i in 1:length(parameters)) {
result <- rbind(result,
my_function(parameters[i,1], parameters[i,2], parameters[i,3]))
}
result
|
1
2
3
4
5
6
7
|
## repeated text number_rows generated_text
## 1 1 one 3 one
## 2 1 one 3 one
## 3 1 one 3 one
## 4 2 two 2 twotwo
## 5 2 two 2 twotwo
## 6 3 three 1 threethreethree
|
That’s very ugly: You have to initialize the result-data.frame and it’s slow.
Whenever you want to use a for-loop in R step back and think about using something
else.
… lapply()
Instead of for-loops you should use apply
or one of its derivates.
But apply
works with lists. data.frames are lists but column-wise ones.
So we need to split the data.frame parameters into a list rowwise using split
.
Then we can apply my_function
. Then we use do.call(rbind, x)
do merge the
results into one data.frame.
1
2
3
|
do.call(rbind,
lapply(split(parameters, 1:nrow(parameters)), function(x) my_function(x[[1]], x[[2]], x[[3]]))
)
|
1
2
3
4
5
6
7
|
## repeated text number_rows generated_text
## 1.1 1 one 3 one
## 1.2 1 one 3 one
## 1.3 1 one 3 one
## 2.1 2 two 2 twotwo
## 2.2 2 two 2 twotwo
## 3 3 three 1 threethreethree
|
That’s a lot more R-like. But the winner is:
… pmap_dfr() out of the purrr-package
The most elegant way I know of is purr’s pmap_dfr
1
|
pmap_dfr(parameters, my_function)
|
1
2
3
4
5
6
7
|
## repeated text number_rows generated_text
## 1 1 one 3 one
## 2 1 one 3 one
## 3 1 one 3 one
## 4 2 two 2 twotwo
## 5 2 two 2 twotwo
## 6 3 three 1 threethreethree
|
pmap_dfr
respects the column-names and parameter-names of the function. So
you can mix them in the parameter-data.frame:
1
2
3
4
5
6
|
# Mix the parameter columns
parameters_mixed_columns <- parameters %>%
select(text, number_rows, repeated)
# pmap_dfr still works as wanted
pmap_dfr(parameters_mixed_columns, my_function)
|
1
2
3
4
5
6
7
|
## repeated text number_rows generated_text
## 1 1 one 3 one
## 2 1 one 3 one
## 3 1 one 3 one
## 4 2 two 2 twotwo
## 5 2 two 2 twotwo
## 6 3 three 1 threethreethree
|
Update
There’s another way you can interate over the rows of a data.frame.