Predictive modelling is all the rage these days. However, end users and data scientists have very different ideas about what predictive models truly are. With analytics becoming the buzz word and machine learning gaining widespread recognition, there is a tendency to view it as a magic wand, a panacea for all the analytics problems that a spiteful life throws at us. Data scientists are often asked about how to make their predictive models more accurate and whether newer algorithms can help improve the quality of their models. The answer to that question, as to most complex questions is – it depends. There really is no one size fits all where the accuracy of predictive models is concerned. Engineering problems, which are fairly well defined, require accuracies as high as 99% or above. In contrast, biologists dealing with complex systems seem to be deliriously happy if they approach an accuracy of around 75%. Given this wide disparity, there is a lot of confusion among folks trying to understand what predictive modeling is all about. My attempt here is to try and dispel a few myths regarding the same.Myth 1: Machine learning can predict anything – Well no. Machine learning approaches are dependent on the data input to the system. They can learn from the data and use it to predict the future provided that future is part of the dataset used to learn the model. Nassim Nicholas Taleb, in his book The Black Swan illustrates this with the example of someone trying to predict the health of a turkey in the US, given only the information for the previous days. Let us assume the person starts predicting at the beginning of the year. His predictions are going to be about the continued health of the turkey. On the Thanksgiving Day, however, his prediction is going to be horrendously wrong since nothing in the data he had accumulated had anything remotely similar to such an event. Hence, the model would fail to predict the death of the turkey on Thanksgiving Day. So the first lesson is that predictive models can't predict what is not present in the training data.Now, this can be depressing to folks who have looked at predictive modeling as a magic wand. So what can these models predict then? There are plenty of things that predictive models can do as long as we understand their limitations. The only approach that would work would be to build the model using data from previous years. A machine learning algorithm would then be able to account for the sudden demise of the turkey on that particular day. It is all about the data.Myth 2: Complicated algorithms can help to improve accuracies – It is generally a rookie mistake to assume that if we were to take a data set that we have and use more complicated algorithms to learn from it, our model accuracies are going to be better. Given the state of the machine learning algorithms now, the accuracies for most models will fall within a few percentage points of each other. Algorithmic complexity can never provide enough cover for bad data. As the old adage goes, 'Garbage in garbage out', and it is completely applicable to predictive modeling as well.Myth 3: We need to use as much data as we can get – What we need is not large volumes of data but large volumes of relevant data. For example, if we were trying to build a predictive model on employee churn, we might need employee performance data, rewards and recognition data, compensation, pay grades, age, experience, tenure, leaves taken, distance from work and educational details to start with.Myth 4: Once we have the data we are guaranteed a good model – Having data is no guarantee of a successful predictive model. Hence, before model building it is essential to conduct exploratory data analysis and identify correlations of the different variables with the dependent variable (churn in this case) to see if there really is any point in even building the model. Choice of the right variables (also called feature or variable selection) becomes extremely critical for building the right model.Myth 5: A model once built is applicable for all time – Life is not constant. Things change and so does the applicability of a predictive model. If the demographic profile of your organisation changes, chances are that the predictive attrition model that was so lovingly built a year ago would be completely useless at predicting attrition in the current population. Hence, as new data comes in, models need to be rebuilt.As you can see, predictive modeling is an art as well as a science. Understanding the limitations of these approaches is critical before we can start thinking about implementing them in our organisations.The writer is founder & CEO of HR analytics start-up nFactorial Analytical Sciences

COMMERCIAL BREAK
SCROLL TO CONTINUE READING