Dplyr 0.7.0 has been published. One of the greatest improvements is the enhancement for standard evaluation.
Let’s look at an example. Let’s say I want to apply a function to a column of a data.frame. But the name of the data.frame can change from call to call. So the actual column is the value of a string.
Earlier to dplyr 0.7.0 I used the following methods using lazyeval:
1
2
3
4
5
|
library(dplyr, warn.conflicts = FALSE)
library(lazyeval)
data <- data.frame(a=c(1,2,3), b=c("aaaa", "bb&bb", "ccccc"), stringsAsFactors=FALSE)
data
|
1
2
3
4
|
## a b
## 1 1 aaaa
## 2 2 bb&bb
## 3 3 ccccc
|
1
2
3
4
5
6
7
8
|
# key holds the name of the column to be changed
key <- "b"
# mutate_call holds the function to replace each & by \&
mutate_call <- lazyeval::interp(~gsub("&", '\\\\&', var), var=as.name(key))
data %>%
mutate_(.dots = setNames(list(mutate_call), key))
|
1
2
3
4
5
|
## Warning: `mutate_()` is deprecated as of dplyr 0.7.0.
## Please use `mutate()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
|
1
2
3
4
|
## a b
## 1 1 aaaa
## 2 2 bb\\&bb
## 3 3 ccccc
|
Calling lazyeval::interp feels a little clumsy.
Starting dplyr 0.7.0 you can do it this way:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
library(dplyr)
# Same data as before
data <- data.frame(a=c(1,2,3), b=c("aaaa", "bb&bb", "ccccc"), stringsAsFactors=FALSE)
key <- "b"
# Define a simple function doing the replacement
my_func <- function(var_in) {
var_out <- gsub('&', '\\\\&', var_in)
return(var_out)
}
# Call mutate with some syntactic sugar:
data %>%
mutate(!!key := my_func(.data[[key]]))
|
1
2
3
4
|
## a b
## 1 1 aaaa
## 2 2 bb\\&bb
## 3 3 ccccc
|
As you can see the new way is lot easier to read and understand. I use three syntax changes here:
- First !!key is used to “unquote” the name of the new (here the old name) column.
- Second := is used to assign the value of the right hand side to the new column.
- Third .data[[key]] is used to access the column of the data.frame.
All this and much is more is explained here.