Interview with UN Big Data expert Pulak Ghosh: A big picture of big data in India

08th Oct `15, 12:44 PM in Analytics

Dr. Pulak Ghosh, Professor of Quantitative Methods at the Indian Institute of Management Bangalore (IIMB), is a member…

Mastufa Ahmed Contributor

Dr. Pulak Ghosh, Professor of Quantitative Methods at the Indian Institute of Management Bangalore (IIMB), is a member of the UN Secretary-General’s Big Data Privacy Advisory Group. He recently won the National Biennial award 2015 for his outstanding contribution to Statistics.


Have Indian businesses been able to move beyond the hype to seize the big wealth of big data?

While the potential benefits of big data are real and significant, and some initial successes have already been achieved, there are many technical challenges that must be addressed to fully realize this potential. Big data is supposed to be a $25 billion industry and India has the great opportunity to take a large share from it. However, big data is in a nascent stage in general and more so in India. Many believe that this technology is about large volumes of data. While this is true, and has always been there, what compounds its intricacy is the nature of data which is mostly ‘unstructured’. The real value of big data is combining off-line (structured) and on-line (unstructured) and making the inference real-time.

Firms in India are yet to combine the two in this real-time fashion. Either they are doing analytics on the massive data or they are doing analytics on social media data (mostly very simple negative-positive sentiment analytics only), but separately. While some start-ups are moving in this direction, the wave is yet to become a tide. Heterogeneity, scale, timeliness, complexity, and privacy problems with big data impede progress at all phases of the pipeline that can create value from data. In India, analytics is mostly service based and very few are product-embedded. Companies here are weighing in its benefits while many etailers are already storing data to underpin their likely future initiatives. They may not be cashing in on the data insights yet, but this is what big data warrants to start with, and then moves onto advanced analytics initiatives.

What are major drivers of big data and analytics and who’re the frontrunners in India?

Let us try to understand the growth pattern in India’s big data and analytics ventures. When I came back to India back in 2009, after having a decade-long stint in United States as a data scientist in health care and biostatistics, there was almost nothing on data analytics. It was mostly data analysis! Use of data based analysis in healthcare and clinical trial was almost non-existent and no where comparable to the western market.

However, some movement started happening in the banking sector. Post 2008, banks started realizing the potential of the humongous customer data they already had stored. This eventually led them doing more business with the existing customers by betting on customer preferences and addressing their pain points. Banks did move fast also because they always had quality data! The new data-driven business model gave them a fillip to their new initiatives of doing more business with the on-board customers, given that banking data is more reliable than the data from other business verticals such as retail. So, it wasn’t unless the year 2010/11, when some frontrunners like ICICI Bank, and HDFC Bank jumped onto the bandwagon on a serious note. State Bank of India, for instance, is heavily investing on big data analytics today.

The next big player for use of big data analytics are e-commerce companies — FlipKart, Amazon, etc. They are mostly using unstructured and structured data in a combined way. For example, e-commerce companies also need to develop algorithm for cross-sell /up-sale. However, with nearly 1.5 lakh products on display how does one develop the algorithm in real-time. Added with that, the problem of sparseness in the data as not every product get sold frequently and there is an inherent minimum time before a customer buys the same product again!

The third segment where I see more and more use of analytics is retail and FMCG.

Please share some of the data-driven initiatives being taken by businesses

Appreciation of analytics is gaining momentum with an exponential rate! Already convinced players like, banks, e-commerce companies are taking the analytics expertise to the next level. While, Citi, HSBC, HDFC, ICICI and Axis bank has now a dedicated team to look at problems using advanced analytics, the largest commercial bank of India, State Bank of India has started a vertical on analytics with balanced group comprising of several statisticians, banking professionals and computer scientist to develop advanced analytics methods. Coming to retail, Amazon and FlipKart started betting on their data following the early success stories scripted by the banks. Snapdeal has also started analytics recently.

More and more firms are today convinced that there is a great deal of competitive advantage in taking decision which is supported by findings through analytics. Some of the new players in this league are Madura Fashion, ITC, Marico, Spencer to name a few.

Bangalore today has some 500+ analytics firms, while 80% of Indian firms are not yet over to analytics. Apart from retail and banking, analytics adoption in healthcare is also picking up in India. Fortis and Apollo are some of the healthcare providers that bet on analytics. Apollo Hospital seeks consultations on project basis in areas like resource optimization, dashboard analysis and segmentation of patients. A fresh new concept called ‘health eCommerce’ starts gaining traction recently. Former CEO of iGate Phaneesh Murthy is launching an online healthcare marketplace.

Another area which is being leveraged by almost all firms having a presence across social platforms is ‘social media analytics’. This move transcends the borders of business verticals where product companies keep a tab on users’ sentiments and feedbacks. Today customers’ sentiment analysis is one of the most common modus operandi for companies to gauze their product strengths and weaknesses which many are working upon.

With India set to see a huge demand for data experts in coming days, how do we get the manpower given that we don’t have interdisciplinary course available in India yet?

This is a genuine problem. Those who understand business are not comfortable with programming or for that matter visualisation tools. In other words, a programmer may not find statistics intriguing and hence this gives rise to another problem of hiring multiple people –one for statistics and one for programming, and so on. Additionally, we don’t have enough number of statisticians in India. Having said, things are changing and I hope soon we would see interdisciplinary courses starting off across institutes.

What steps, best practices, and how-tos do you suggest for those seriously pondering over leveraging analytics for business gains?

There are three pillars of analytics. Technology, data science and business insight. This is a team work. In the technology domain, knowledge of mapR, Hadoop, etc play a major role. While in data science two main divisions are statistics and machine learning. This is where the maximum job are created and will have more and more demand. This is also the hard one to get a grasp over. Third one is the business insight. Often analytics does not bring the desired fruit because of lack of business insight. By business insight I do not mean business communication, but asking the right question. More often asking the right question needs great deal of business insight and half of the advantage of analytics lies in asking what to be solved and half is how to solve.

In statistics one has to understand that a lot of existing predictive models usually won’t work in a big data framework. A very simple example is, ‘is “mean” still a goes summary measure for a 40 million transactional data?’ Does random sampling still work, as doing random sampling for a large data will have higher chance of having multimode! How does one choose a set of predictors when the number of predictors in a data is more than 1000? Similar is the case of machine learning. Usually, ML tools are not very robust and with slight change of variables, the entire results can change. So how one can develop more robust algorithm? These are some, among many, issues that analytics industry is struggling due to the lack of academic and industry collaboration!

Could you cite a few innovative data-driven initiatives being undertaken by Indian businesses?

Well, one of the first things that come to my mind is the use of data analytics in the recently concluded National election campaign of the BJP! is working on an innovative idea of creating a digital smart basket which will be individualistic and will help the shoppers online to shop more effectively.

HDFC bank is using analytics to get a complete 360 degree view of customer. State Bank of India is using a very innovative algorithm combining both online (social media, blog) and offline (customer level) data to come up with a recommender system on real-time to send promotion. Gitanjali Group, the global integrated diamond and Jewellery manufacturer and retailer, uses Visual Analytics to explore and analyse business data such as supply chain and profitability metrics.

If Booz Allen, KPMG, Apple, Wallmart, Terradata, Netflix, Intel, Dell are the top hirers for data scientists in US, who all do you think would be the top hirers of data wizards in India?

Banks (both private and public) including Citi, HDFC, ICICI, and SBI. E-commerce companies including FlipKart, Amazon and Snapdeal. Master Card and American Express. Retail firms like future group and shoppers Stop, etc.

What do you do in the UN’s big data group?

My role is three fold: I lend my expertise to the development of guidelines and practices that mitigates risks associated with privacy in Big Data, while preserving utility for global development. To contribute on how Big data analytics can be used for responsible social value creation and participating in an ongoing privacy dialogue, providing feedback on proposed approaches and engaging in a privacy outreach campaign.

This interview originally appeared on ITNEXT.