Data Science

Cohort analysis with R – “layer-cake-graph”

09th Jul `15, 11:05 AM in Data Science

Cohort analysis is one of the most powerful and popular techniques available to marketers for assessing long-term trends in…

Sergey Bryl
Sergey Bryl' Contributor
Follow

Cohort analysis is one of the most powerful and popular techniques available to marketers for assessing long-term trends in customer retention and calculating life-time value.

If you studied custora’s university, you would be amazed by the “layer-cake graph” they propose for Cohort analysis.

cohort analysis

Custora says: “The distinctive “layer-cake graph”, produced by looking at cohorts in calendar time, can provide powerful insights into the health of your business. At a given point in time, what percentage of your revenue or profit came from new vs. repeat customers? Tracking how that ratio has changed over time can give you insight into whether you’re fueling top-line growth solely through new customer acquisition – or whether you’re continuing to nurture those relationships with your existing customers over time.”

Usually, we focus on calculating life-time value or comparing cohorts, but I was really impressed with this useful analytical approach and tried to do such a chart in R.

After we process a great deal of data, it should be of the following structure. There are Cohort01, Cohort02, etc. – cohort’s names due to customer signup date or first purchase date and M1, M2, etc. – period of cohort’s life-time (first month, second month, etc.):

cohort sum

For example, Cohort-1 was signed up in January (M1) and brought us $270,000 during the first month (M1). Cohort-5 was signed up in May (M5) and brought us $31,000 in September (M9).

Now, suppose you’ve done data process and got cohort.sum data frame as a result and it looks like the table above. You can replicate this data with the following code:

cohort analysis

Let’s create the “layer-cake” chart with the following R code:

cohort sum analysis

And we take such amazing chart:

cohort analysis

It seems like there was some promo in eighth month (M8) and a few cohorts responded. Really useful graph.

Although the R code looks pretty simple, I spent most of the time aggregating data. I can’t propose universal R code for this task, as the structure of your initial data can be completely different.

Have questions? Please feel welcome to ask!

This article was originally published on analyzecore.

MORE FROM BIG DATA MADE SIMPLE