dplyr for programming

When someone asks me what’s the main advantage of R over python I almost always answer: “It’s dplyr! The way you can handle data.frames is a dream.”

Pythonista: “But pandas has also Data.Frames. They are built to resemble their counterparts in R.”

Me: “That’s true. But they manage the functionality of plain R. Actually R has made several steps ahead with dplyr.”

But dplyr is built mainly for interactive data exploration. So it’s very easy to select, mutate, group and summarize your data.frame (or tibble). The reason is non-standard evaluation (NSE) (See more in Hadley Wickham’s book Advanced R. NSE occures when you use a column-name without any quoting.

But when it’s up to programming it get’s a little more complicated.

So let’s look at an example:

dplyr 0.7.0 with great improvements for programming

Dplyr 0.7.0 has been published. One of the greatest improvements is the enhancement for standard evaluation.

Let’s look at an example. Let’s say I want to apply a function to a column of a data.frame. But the name of the data.frame can change from call to call. So the actual column is the value of a string.

Earlier to dplyr 0.7.0 I used the following methods using lazyeval: