Predictive modelling is a process of creating a statistical model to predict the future behaviour. It is more of the area in data mining forecasting probabilities and trends. A predictive model is made up of predictors which are factors that influence future results. For example a retail shop’s model should consider the customer’s gender, age and purchase history which might be used to predict the future sale.
Key features: The predictive model is described using three key features.
1. The predicted outcome
3. How predictors are used to create the outcome?
Scope: The need of developing a predictive model is on the rise. Initially predictive analytics was used in spam filtering system. Now the scenario has changed. Predicting models have become a vital part in CRM, change management, disaster recovery, security management and meteorology.
Predictive analytics supports decision making by diagnosing the business. It helps the firm to eliminate processes which are time consuming. It can be used in almost all the fields with marketing and pricing using it predominantly.
Developing a model: The first step in developing a predictive model is selecting relevant candidate predictor variables for possible inclusion in the model. A limited number of variables are selected from a vast list to bring bias to the selection process. Inappropriate selection of variables is an important and common cause of poor model performance. The major issue in developing a predictive model is to deal with the missing data.
Validating the model: Validation can be performed using internal or external validation. A common approach to internal validation is to split the data set into two portions—a “training set” and “validation set”. The objective of the external validation is to apply a previously developed model to new individuals whose data were not used in the model development, and quantify the model’s predictive performance.
When a validation study shows disappointing results, researchers are often tempted to reject the initial model and to develop a new predictive model using the validation cohort data.
Assessing the performance of the model: When assessing model performance, it is important to remember that explanatory models are judged based on strength of associations, whereas predictive models are judged solely based on their ability to make accurate predictions. The performance of a predictive model is assessed using several complementary tests, which assess overall performance, calibration, discrimination, and reclassification.
There are factors driving predictive analytics. Some are;
Technological advances: Depending on the amount of data the processes use the techniques used in the predictive analysis are decided. Some may require thousands of calculations to be carried out with great performance. Advanced hardware and software packages to handle the calculations are analysed and validated.
Data Availability: The validation of the predictive model depends on the data available to develop it. Data enough to be able to decide a model is mandatory. In addition to the data of the business there are data from the third party users that the business must use to predict the model.
Limitations: The predictive models has their limitations. A predictive model cannot be created without sufficient dataset. The concept of the product to be predicted should be clear. Changes are likely to occur in a business. So one cannot be sure if the data can provide with the actual model.
Some companies treat the predictive model as a black box. The results of the predictive model should be checked if it makes sense.
There are a number of tools that are available in the market which can help us with the predictive analytics.
Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. It contains a collection of visualization tools and algorithms for analysis and predictive modelling.
Apache Mahout is a project to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification.
Minitab is a statistics package developed at the Pennsylvania State University. It helps us to analyze our data and improve the products and services with the leading statistical software used for quality improvement.
Oracle Data mining provides powerful data mining functionality as native SQL functions within the oracle database. The Oracle spreadsheet add-in for predictive analytics provides predictive analytics operations.
Stata is a general-purpose statistical software package created in 1985 by StataCorp. Stata’s capabilities include data management, statistical analysis, graphics, simulations, and regression analysis.