If you ever saw Prashant Bhattacharji at a bus stop, you probably wouldn't give him a second glance. Dressed in a plain shirt and a pair of trousers, he looks like any other IT professional you see on your way to work every day.
As soon as Bhattacharji gets to work, he turns on his PC and starts scanning sheets after sheets of data. But he is not an accountant. He is part of the emerging breed of what the world calls data scientists, a job title that came into existence only five years ago.
He plays with all these numbers, tries to decryp them to find a pattern only a very skilled mind could spot, and turns them into highly valuable information for MNCs, start-ups and even Fortune 500 companies across the world.
Bhattacharji works as a data scientist with HackerRank, a social platform for coders. But he has also put his skills to good use on another task — he managed to scrape the 2012 and 2013 exam results of almost a million students from the two central boards in India, the CBSE and ICSE.
Through this experiment, he found something unusual — the scoring patterns were irregular, which meant that score manipulation and inflation were rampant in these boards.
There were more intriguing results. “There also seemed to be an unspoken policy of passing anyone who scored as low as even 20 marks, in an attempt to increase the passing percentage,” Bhattacharji says. He has published these findings on The Learning Point, an online education website founded by him.
These results are available to the public, on the websites of these boards, but we would never think of going through all of it because it is such a daunting task. Not for data scientists. Simply put, data scientists are the magicians of the 21st century. They take all the information we find incomprehensible and make sense out of it.
So what skills would a person need to become a good data scientist? Academically, a basic foundation in computer science, mathematics and statistics is sufficient, says Bhattacharji. However, the alchemy of these skills is what differentiates the boys from the men.
Agreeing with him, Harsh Singhal, a data scientist at LinkedIn, says, “Data scientists come from multi-disciplinary backgrounds. They can code, create visuals, convert data into products and they can tell stories.” In other words, a data scientist has to be a storyteller. Data science is all about being curious and spotting trends by looking beyond the obvious.
Say, for instance, Flipkart, arguably India's most popular e-commerce portal, wants to know which of its customers are most likely to commit credit card frauds. How do they go about it? “If you look at a customer's files, all the raw data is available,” says Regunath B, the principal architect at Flipkart. But what’s just raw data to the unskilled is a juicy bag of information to a data scientist. And here’s where he works his magic.
“A data scientist uses algorithms to predict possible fraud, and keeps the company updated,” he adds. Thus, a data scientist must be able to understand problems from the business' perspective.
The space they inhabit
The world is more connected now than ever before with the internet, e-commerce websites, and an increasing number of start-ups. “In this era where data is pervasive, we need data scientists more than ever, especially in sectors such as banking, financial services and insurance, logistics, retail and e-commerce and healthcare,” says Chaitanya Sagar, the CEO of Perceptive Analytics.
In fact, the Government of India recruited a team of data geeks to execute the Aadhaar project in 2009. Regunath was part of the team selected to do the data crunching for what is the world's largest biometric identity system. He says, “No other nation had attempted this feat before.
Issuing IDs for 1.2 billion people across the country is a massive data challenge. Let's put it this way: A movie file contains approximately 1GB of data. Here, we were looking at multiples of tens of thousand of that amount of data..”
We need more data geeks
Although the world is getting there, the Indian industry is yet to realise the potential of having good data scientists on board, more importantly in a country which has the largest datasets in multiple fields, be it agriculture, population or even media and entertainment.
A NASSCOM-CRISIL report puts big data opportunities for the Indian IT industry to be worth $1 billion globally by 2015. However, according to various projections, there is likely to be a shortage of 1.90 lakh trained data scientists globally in the next few years, and 60% of this would likely be in India alone.
Organisations are trying to combat this shortage by identifying talent during the recruitment process. Sagar says, “In addition to personal interviews, we ask the candidate to work with us for a few days first. We look for comfort with data and the ability to read into it."
He adds, “Great data scientists are like diamonds, and diamonds are rare. They are also very expensive.” That explains the comfortable six-figure compensation they could potentially get.
However, he says companies are willing to spend, whether it requires hiring a person who is good at his or her job or training existing employees. Similarly, universities have started offering courses in data science. It would also do an aspiring data scientist good to embrace online learning Coursera.org has online courses from top universities in the US for free.
Companies also turn to popular data science competition portals while hiring data scientists. Two such portals, HackerRank and Kaggle are used by Facebook, Microsoft, Amazon and Quora, among others, for hiring. Bhattacharji says, “People aren't necessarily fixated on specific academic qualifications as long as candidates can demonstrate their skills.”
With even the Harvard Business Review calling data scientist the 'sexiest job of the 21st century', there hasn't been a better time to be a data geek