Data Mining

Learning Data Mining: 12 books on R

28th Jun `14, 10:46 PM in Data Mining

1. R Cookbook: Like the rest of the O’Reilly Cookbook series, this one offers how-to “recipes” for doing…

Guest Contributor

1. R Cookbook: Like the rest of the O’Reilly Cookbook series, this one offers how-to “recipes” for doing lots of different tasks, from the basics of R installation and creating simple data objects to generating probabilities, graphics and linear regressions. It has the added bonus of being well written. If you like learning by example or are seeking a good R reference book, this is well worth adding to your reference library. By Paul Teetor, a quantitative developer working in the financial sector.

2. R Graphics Cookbook: If you want to do beyond-the-basics graphics in R, this is a useful resource both for its graphics recipes and brief introduction to ggplot2. While this goes way beyond the graphics capabilities that I need in R, I’d recommend this if you’re looking to move beyond advanced-beginner plotting. By Winston Chang, a software engineer at RStudio.

3. R in Action: Data analysis and graphics with R: This book aims at all levels of users, with sections for beginning, intermediate and advanced R ranging from “Exploring R data structures” to running regressions and conducting factor analyses. The beginner’s section may be a bit tough to follow if you haven’t had any exposure to R, but it offers a good foundation in data types, imports and reshaping once you’ve had a bit of experience. There are some particularly useful explanations and examples for aggregating, restructuring and subsetting data, as well as a lot of applied statistics. Note that if your interest in graphics is learning ggplot2, there’s relatively little on that here compared with base R graphics and the lattice package. You can see an excerpt from the book online: Aggregation and restructuring data. By Robert I. Kabacoff.

4. The Art of R Programming: For those who want to move beyond using R “in an ad hoc way … to develop[ing] software in R.” This is best if you’re already at least moderately proficient in another programming language. It’s a good resource for systematically learning fundamentals such as types of objects, control statements (unlike many R purists, the author doesn’t actively discourage for loops), variable scope, classes and debugging — in fact, there’s nearly as large a chapter on debugging as there is on graphics. With some robust examples of solving real-world statistical problems in R. By Norman Matloff.

5. R in a Nutshell: A reasonably readable guide to R that teaches the language’s fundamentals — syntax, functions, data structures and so on — as well as how-to statistical and graphics tasks. Useful if you want to start writing robust R programs, as it includes sections on functions, object-oriented programming and high-performance R. By Joseph Adler, a senior data scientist at LinkedIn.

6. Visualize This: Note; Most of this book is not about R, but there are several examples of visualizing data with R. And there’s so much other interesting info here about how to tell stories with data that it’s worth a read. By Nathan Yau, who runs the popular Flowing Data blog and whose doctoral dissertation was on “personal data collection and how we can use visualization to learn about ourselves.”

7. R For Dummies: I haven’t had a chance to read this one, but it’s garnered some good reviews on If you’re familiar with the Dummies series and have found them helpful in the past, you might want to check this one out. You can get a taste of the authors’ style in the Programming in R section of, which has more than a 100 short sections such as How to construct vectors in R and How to use the apply family of functions in R. By Joris Meys and Andrie de Vries.

8. Introduction to Data Science: It’s highly readable, packed with useful examples and free — what more could you want? This e-book isn’t technically an “R book,” but it uses R for all of its examples as it teaches concepts of data analysis. If you’re familiar with that topic you may find some of the explanations rather basic, but there’s still a lot of R code for things like analyzing tweet rates (including a helpful section on how to get Twitter OAuth authorization working in R), simple map mashups and basic linear regression. Although Stanton calls this an “electronic textbook,” Introduction to Data Science has a conversational style that’s pleasantly non-textbook like. There used to be a downloadable PDF, but now the only versions are for OS X or iOS.

9. R for Everyone: Author Jared P. Lander promises to go over “20% of the functionality needed to accomplish 80% of the work.” And in fact, topics that are actually covered, are covered pretty well; but be warned that some items appearing in the table of contents can be a little thin. This is still a well-organized reference, though, with information that beginning and intermediate users might want to know: importing data, generating graphs, grouping and reshaping data, working with basic stats and more.

10. Statistical Analysis With R: Beginner’s Guide: This book has you “pretend” you’re a strategist for an ancient Chinese kingdom analyzing military strategies with R. If you find that idea hokey, move along to see another resource; if not, you’ll get a beginner-level introduction to various tasks in R, including tasks you don’t always see in an intro text, such as multiple linear regressions and forecasting. Note: My early e-version had a considerable amount of bad spaces in my Kindle app, but it was still certainly readable and usable.

11. Reproducible Research with R and RStudio: Although categorized as a “bioinformatics” textbook (and priced that way – even the Kindle edition is more than $50), this is more general advice on steps to make sure you can document and present your work. This includes numerous sections on creating report documents using the knitr package, LaTeX and Markdown — tasks not often covered in-depth in general R books. The author has posted source code for generating the book on GitHub, though, if you want to create an electronic version of it yourself.

12. Exploring Everyday Things with R and Ruby: This book oddly goes from a couple of basic introductory chapters to some fairly robust, beyond-beginner programming examples; for those who are just starting to code, much of the book may be tough to follow at the outset. However, the intro to R is one of the better ones I’ve read, including lot of language fundamentals and basics of graphing with ggplot2. Plus experienced programmers can see how author Sau Sheong Chang splits up tasks between a general language like Ruby and the statistics-focused R.