Data Science

R – What’s in it for me?

08th Jul `17, 10:34 AM in Data Science

Although R has been around since the 90’s it’s only become widely known in the last few years….

Megan Mary Jane
Megan Mary Jane Contributor

Although R has been around since the 90’s it’s only become widely known in the last few years. The explosion of data being produced by websites, the Internet of Things etc has led to great interest in how to analyze and use that data to best effect which is where R comes in.

Simple, robust and flexible it fits the bill. Let’s take a more detailed look:

R is open source

R is open source. You can download it here. Developers can access the language for free over the internet. This is very different from some other statistics packages we could mention!

R has no license restrictions as it is issued under the GNU (General Public License).

It also means that it works on all platforms, and isn’t tied to one particular operating system. It works with Windows (both 32bit and 64 bit), Mac, Linux and UNIX (and it’s derivatives like Solaris).

R is efficient

R is a very efficient scripting language that makes it ideally suited to resource intensive languages. This means that it handles large data sets and resource intensive simulations very efficiently.

For very big datasets R can also be used on computer clusters. In one example a commercial R package was used to process 30 million rows of data for 60 variables in just 10 minutes!

R is easy to learn

R is simple to learn. Statistics and statistical testing are tough to master so your challenge will be there if you are new to statistics not in learning R.

If you are looking to master R, there are plenty of ways you can learn it. Online blogs and articles explain it very nicely, as do some YouTube channels (for example). If you’re short on time or motivation then there are also classroom training providers that will get you up the learning curve very quickly (for example)

R is growing

There’s no point in learning a language that no-one will be using in 5 years time.

R is currently the leading open source statistics and analytics package available, with over 2 million users. It’s established lead and community mean that it is likely to stay that way for the foreseeable future and that it will grow as the data analytics market grows in the year to come.

KDNuggets a site dedicated to data science software has had R as the top software package used by its readers for the last 5 years.

Because R is so widely used and so efficient many advances in statistics are available on R before other packages.

R programmers get well-paid

Learning R is a good investment of time. The current average salary of R programmers is $126k according to the 2016 Dice Technology Salary Survey.

Demand for R programmers is likely to grow not shrink in years to come and demand is likely to grow faster than the population of R programmers making a good knowledge of R an increasingly rare and so valuable commodity.

R is widely supported

R is a very well supported open source language. There are lots of places to go if you get stuck from  StackOverflow’s R forum to a wide variety of other online R forums.

Also because R is so widely used there are over 8,300 reusable libraries (see here) available for free online. If you’re trying to solve a complex problem the odds are that someone else has solved the same or a very similar problem before. They cover everything from identifying turtles to analyzing solar radiation.

This can make using R a very efficient way to work. If you are lucky and someone has solved a similar problem previously you can access their solution for free and won’t need to start with a blank computer screen.

There is also a growing community of for-profit companies that support R. The biggest is RStudio and Revolution Analytics (recently purchased by Microsoft) which provide tools and services relating to R.

R is easy to relate to other languages

Increasingly software is being shipped with R integrated into it ( for example, MS SQL Server 2016) because if works with them so simply. R can import data from a wide variety of sources including Microsoft Excel and Access, SQL MS SQL Server, MySQL, SQLite, and Oracle. Because it imports data using an ODBC (Open Database Connectivity Protocol) connection it can be linked to almost any database package.

R is outstanding at producing graphical output

R is the best package out there for producing striking, clear graphical output. It can produce any graph or visualization that you can name and a fair few that you can’t probably. The graphs can be static or dynamic.

Because R is a fully programmable graphical language your output is limited by your coding abilities, not the software itself.