Business Intelligence

When R may be a better option than M in Power BI

31st Aug `16, 02:00 PM in Business Intelligence

Microsoft has really impressed me with Power BI. They are working very hard to improve it by making…

Ryan Wade
Ryan Wade Contributor
Follow

Microsoft has really impressed me with Power BI. They are working very hard to improve it by making weekly updates to the Power BI service and monthly updates to the Power BI desktop.

One of the recent updates that was made to Power BI desktop is the GA release of the ability to use R as a data source. Many have recommended using this option to score your data before bringing it into Power BI via one of the many data science algorithms available in R. That is a very good use case. But in addition to that I recommend using it as a viable alternative to performing difficult data wrangling tasks that M is not equipped to efficiently handle. Here are 4 situations where R may be a better option than M in Power BI for data wrangling:

1. When you can benefit from leveraging a data wrangling package on CRAN

2. When you need to leverage regular expressions

3. When you want to make a date dimension table on the fly

4. When you want to log information about your data loads

When you can benefit from leveraging a data wrangling package on CRAN

We already know how great R is for data science. But little do many know it is also great for the type of data wrangling one does to prepare their data for Power BI. R is a more mature language than M and it has a bigger community of users than M. Many bright R developers have written packages that adds to the capabilities of R.

When referring to what can be done in iOS, Apple often say that there is an “app” for that. Likewise, when R developers refer to what can be done in R, we often say that there is a “package” for that. For instance:

· If one needs to scrap data from the web there are packages for that (rvest, httr, rcurl, and others)

· If one needs to make complicated transformations to their data there are packages for that (dplyr, tidyr, lubrdiate, stringr, and others)

· If one needs to extract data from SQL or NoSQL data sources there are packages for that (RODBC, SparkR, RMongo, RHadoop, and others).

· If one needs to work with XML or JSON data there are packages for that as well (XML, jsonlite, and others).

When you need to leverage regular expressions

Data is very valuable. Some call it the new gold. Like gold you have to cleanse or purify it in order to make it usable. Often the methods you use to cleanse your data are straight forward and tools like Power Query will suffice. But sometimes your data is pretty dirty or it is in a complicated format. In many of those cases some custom M code will suffice. But there are a few instances where that will not be the case.

Let’s say that your data is in a format that is hard to parse because there is not a single delimiter that you can use. But if there is a clear pattern that can be used to parse your data than you could benefit from regular expressions. Unfortunately, regular expressions are not available in M but they are available in R.

When you want to make a date dimension table on the fly

There are multiple ways you can bring in a date dimension table into Power BI. You can bring one in from your data warehouse, you can create one on the fly using Power Query and M, and you can create one on the fly using DAX.

A lesser known way is using R and leveraging the “lubridate” package created by my favorite R package developer, Hadley Wickham. Here is the link to my github repo that shows an example of using R and the lubridate package to create a date dimension table.

The lubridate package have a lot of easy to use and very forgiving date functions that can be used to create date attributes for your date dimension table. Let’s take a look at the “ymd” function as an example. This function returns the date representation of the text that is passed to it that can be interpreted as a year-month-day format. The following code examples all returns a date of 2016–08–11: ymd(20160811), ymd(“2016 August 1”), and ymd(160811). Amazing! That is a “drop the mic” moment! LOL. M is not that easy and is not that forgiving. The lubridate package also have functions that handles ISO year, ISO week, and ISO day functions so using R you could easily add those date attributes to your date dimension table. That functionality is not natively available in M.

Note that in the code example I included in the repo, I was able to use the “rvest” package to scrap a website for federal holidays and add that information as an attribute to my date dimension table. I can probably do the same in Power Query but it was more intuitive for me to do it in R.

When you want to log information about the load

M does not give you the ability to log information about your loads. That is not the case with R. Using R you can do the following:

· Write summary statistics about the data that was loaded to a file

· Test your data against control totals and write those results out to a file

· If you are using the “readr” package to load your data you can get your load errors from the “problem” function and write that information to a file.

None of this is possible in M.

M is a great programming language for data munging and it has a lot of potential. Microsoft developed a very intuitive interface for M via Power Query that allows business analysts to perform most data transformation tasks in a relatively easy way. But if you are faced with a very difficult data transformation task then R may be a better option.

Originally appeared on Medium.

MORE FROM BIG DATA MADE SIMPLE