While the language has been around since 1996, the momentum in the trade press about R has been steadily increasing over the last few years. R is part of the explosion of all things Data Science, which has shown great promise in providing meaning to data through the application of advanced analytics and the ability to visually present data.
Since the introduction of SQL Server 2016 which included R Services, it is now possible to use SQL Server as a method of moving R to SQL Server to be part of a scheduled process. For SQL Server to use R, R server needs to be included as part of the installation process. The installation process will want to download R from the Comprehensive R Archive Network [CRAN] website. If it is not possible to access the internet from the server, the installation will provide the links for downloading R. You can then provide the location of the downloaded files.
Once R Server is installed, SQL Server will need to be configured to run R. You will need to run the following script
sp_configure 'external scripts enabled', 1
GO
Reconfigure
GO
After this script is run, SQL Server will need to be restarted. After the restart, R can be run within SQL Server.
There are three versions of R which are used as part of SQL Server, CRAN R, Microsoft R Open and R Server. To understand why there are three different versions of R, one needs to understand a little about the interworking’s of R. R is designed to run completely in memory and was originally written mostly in Fortran and is single threaded. This means that R has some problems with scaling to run large amounts of data. Microsoft re-wrote a portion of the underlying R code in C, utilizing the Intel Math Kernel Libraries, which greatly improved the performance. Using Microsoft R Open is completely compatible with CRAN R, which means running the same code on Microsoft R Open can result in up to a 38% increase in performance with no coding changes applied.
When using R in other applications, whether on premise or in the cloud, it is also possible to change the version of R. When writing R for Azure ML, it is possible to select either CRAN R or Microsoft R Open. When running R in Azure ML, the performance of the code is not based on a single server, as Azure can scale to meet the processing needs of the code. Embedding R code inside of a Machine Learning experiment is another way to improve the performance of existing R code, as cloud resources can be utilized to run the R code.
It is possible to achieve greater performance in R code running on SQL Server because in addition to using all the standard functions included in R, Microsoft has implemented R code libraries which allow for R code to run both in and out of memory. Using these functions, which all begin with rx, the R code will use not only server memory, but also swap data to disk when memory is full. This functionality allows for data analysis to be performed on very large datasets without worrying about memory constraints. R Server is also not limited to running solely on SQL Server Data, it can also be run on Hadoop or Teradata as well. R Server can be installed as part of SQL Server, or on a separate server, to provide greater scale out capabilities.
While R can be run from within SQL Server Management Studio, this is not the best place for writing new R code, as the user interface is not designed for writing R. Visual Studio for R tools or R Studio provide much better development environments for writing R. When writing R code for SQL Server, R client will need to be installed as well.
SQL Server’s implementation of R provides the ability to not only run all the R code in use today, but also performance enhancements to make R part of the data environments going forward.