Norm Matloff, a professor of computer science at the University of California, Davis, has made a comparative analysis of R and Python on key points that have been debated for years in the field of data science.
Before the analysis begins, Matloff first throws out his potential bias: he has written four R-related books, spoken at UseR! And other R conferences, and is currently the editor-in-chief of R journals. But at the same time, he also used Python to knock out code for many years. Matloff hopes that his analysis can be considered fair and helpful.
Then the professional computer scientist and statistician compared R with Python in the following aspects:
Python clearly won.
Of course, this is subjective. But compared with different programming languages, Python greatly reduces the use of parentheses:
Python is very fashionable!
R won a huge victory in this game.
As an educator, Matloff is particularly interested in this.
If you use Python for data science, you have to learn a lot of materials that are not in basic Python, such as NumPy, Pandas and matplotlib.
By contrast, matrix types and basic graphics have been built into basic R, and novices can complete simple data analysis in a few minutes.
Even for systems savvy people, Python libraries are hard to configure, and most R packages are out of the box.
Available database of data science
R wins slightly.
CRAN has more than 14,000 packages. PyPI has more than 183,000 packages, but seems weak in data science.
Matloff gives an example: he used to need code to quickly calculate the nearest neighbor of a given data point, and in CRAN he could immediately find more than one package to perform this operation. And in PyPi, after a rough search, they returned empty-handed.
He also pointed out that the following searches in PyPI had no results: EM algorithm; logarithmic linear model; Poisson regression; tool variables; spatial data; overall error rate and so on.
In fact, it is a great advantage for R to have a standard encapsulation structure. When you install a new package, you know exactly what will happen. Similarly, the generic function of R is a big advantage for R. When using a new package, people know that they can use print (), plot (), summary () and so on, all of which make up the package
Python won slightly.
The debate on R vs. Python is mainly about statistics and CS. Since most of the research on neural networks comes from CS, the available software of NN (Neural Network) is mainly Python. RStudio has done some excellent work in developing Keras implementations, but so far R has been limited in this area.
On the other hand, random forest research is mainly carried out by the statistical community, R has more advantages in this field. R also has excellent gradient enhanced packaging.
Here Python is a little better, because for many people, machine learning means neural networks.
R won a big victory.
The two sides drew.
The basic versions of R and Python do not support multicore computing well. Threads in Python are ideal for I / O, but it is impossible because the notorious Global Interpreter Lock, uses them for multicore computations.Python 's multiprocessing packages and R.
At present, Python has a better GPU interface.
C / C interface and performance enhancement
R is slightly better.
Although tools such as SWIG can connect Python to C / C, there is currently no such powerful feature as R's Rcpp. Pybind11 software package is under development.
In addition, R's new ALTREP concept has great potential in improving performance and availability.
On the other hand, Cython and PyPy variants of Python can pre-eliminate the need for explicit C ≤ C interfaces in some cases. Some people would say that Cython is a C / C interface.
R is still slightly better.
For example, although functions can be objects in both languages, R goes further than Python. Matloff says that whenever you work with Python, you get annoyed that you can't input a function directly to the terminal or edit it, but you can do that on R.
Python has only one OOP paradigm. In R, there are several options (S3, S4, R6, etc.), but some may argue that this is a good thing.
R has magical metaprogramming features (code that generates code), but most CS people don't realize it.
R suffered a terrible defeat.
Python is currently moving from version 2.7 to version 3.x, which will cause some interruptions, but not too complicated.
By contrast, R is rapidly transforming into two languages that are incomprehensible to each other, namely, ordinary R and Tidyverse.. As an experienced R programmer, Matloff says he can't read Tidy code because it calls many Tidyverse functions he doesn't know. Some netizens commented that
Associated data structure
Classical computer science data structures, such as binary trees, are easily implemented in Python. It is not part of base R, but it can be done in a variety of ways, such as data structure packages, which contain a widely used Boost C library.
First, the basic help () function of R is much more informative than Python. It complements example (). Most importantly, the ability to write vignette (returned by the function vignette (), usually a practical introductory article in PDF format) in the R package makes R a real winner in this respect.
The reticulate package developed by RStudio can run Python on R. It can be used as a bridge between Python and R and is suitable for pure computing. But it does not solve the thorny problems in Python, such as virtual environment.
Currently, Matloff does not recommend writing mixed Python / R code.
After analyzing so much, we should choose according to the actual needs. After all, there are no advantages or disadvantages between languages.