

The guide is centered on the packages data.table and dplyr, which bring R syntax closer to Stata, while being generally an order of magnitude faster than Stata for common data manipulations. My benchmark is more about typical data manipulation I agree it would be nice to know more about the estimation of statistical models. The guide targets topics I could not find elsewhere : equivalent to egen by commands, panel data commands, macros, and inplace transformations of large datasets. Now, in my experience, user-written R packages tend to be faster than user-written Stata programs, since R packages tend to use C while user-written Stata programs just use Stata (Mata in the best case). I've also written a guide on data manipulations in R for Stata Users :

R has better tooling to query other databases, parse JSON data, scrape web sites, etc. Feel free to edit the scripts if you spot mistakes. R has much better graphical tools (ggplot2) Stata could only hold one data set at a time. Each chapter gives examples of real studies compiled from the literature.
#Stata vs r software
If he were to use the R lm.fit (which is more similar to statas basic reg command) the R would likely perform much better. The authors develop analysis step by step using appropriate R/Stata functions, which enables readers to gain an understanding of meta-analysis methods and R/Stata implementation so that they can use these two popular software packages to analyze their own meta-data. This obviously will slow down estimation. you can easily log or square a variable within the formula). You can find more information on the github repository, which contains a quick summary of the result, the R and Stata scripts I runned, and the results as a. R allows for manipulation in the regression command (e.g. The only problem of R is that the learning curve is very steep. Below is the result for 500MB (1e7 rows), which may interest some of you: There are many advantages over other software.

I've run some benchmarks comparing the speed of R and Stata for common data manipulation, on datasets ranging from 100MB to 5GB.
