Using git as package manager for R

Okay, the title may be a little misleading. But it addresses the problem I have.

I’m running many different R projects in production. So they must run in a reproductive way and I can’t afford that the programs break if a package update is installed.

dplyr for programming

When someone asks me what’s the main advantage of R over python I almost always answer: “It’s dplyr! The way you can handle data.frames is a dream.”

Pythonista: “But pandas has also Data.Frames. They are built to resemble their counterparts in R.”

Me: “That’s true. But they manage the functionality of plain R. Actually R has made several steps ahead with dplyr.”

But dplyr is built mainly for interactive data exploration. So it’s very easy to select, mutate, group and summarize your data.frame (or tibble). The reason is non-standard evaluation (NSE) (See more in Hadley Wickham’s book Advanced R. NSE occures when you use a column-name without any quoting.

But when it’s up to programming it get’s a little more complicated.

So let’s look at an example:

dplyr 0.7.0 with great improvements for programming

Dplyr 0.7.0 has been published. One of the greatest improvements is the enhancement for standard evaluation.

Let’s look at an example. Let’s say I want to apply a function to a column of a data.frame. But the name of the data.frame can change from call to call. So the actual column is the value of a string.

Earlier to dplyr 0.7.0 I used the following methods using lazyeval: