trendingNow,recommendedStories,recommendedStoriesMobileenglish2382907

Hadoop is NOT “Big Data” is NOT Analytics

Perhaps “smart data” ought to replace “big data” for most analytical applications

Hadoop is NOT “Big Data” is NOT Analytics
Arun Krishnan

I am amazed at the way the words “Hadoop”, “Big Data” and “Analytics” are bandied about in a very haphazard fashion these days. For those desirous of working in the field of Analytics (especially the very young but also some not so young), my earnest entreaty is to understand that these three words mean very different things. Using them interchangeably just demonstrates ignorance rather than expertise.

Perhaps, a bit of history would help to give some perspective. Folks in academia have been solving “big data” problems for a long time using the power of cluster and distributed computing to solve embarrasingly parallel problems. Before the advent of inexpensive “cloud-based” resources, universities and research organizations would build their own very large “super clusters” using either commodity off-the-shelf (COTS) components or if you went back even further, would use large, shared-memory computers (the likes of Silicon Graphics sold these). As research and some large industrial organisations started building “Beowulf” clusters, they started putting together operating system packages that made it easier for people to quickly set up their clusters. Of course people had to write distributed applications on them using specialised languages which could become quite involved.

The terms “Big Data” and “Hadoop” have gained favour in recent times. Hadoop, has made it fairly easy for programmers to take any embarrasingly parallel problem and quickly spread them across large clusters. Big Data on the other hand is to me just the fuel that Hadoop works on to convert it into a form amenable for analysis. A person who is able to write code using Hadoop and the associated frameworks is not necessarily someone who can understand the underlying patterns in that data and come up with actionable insights. That is what a data scientist is supposed to do. Again, data scientists might not be able to write the code to convert “Big Data” into “actionable” data. That's what a Hadoop practitioner does. These are very distinct job descriptions.

Big Data too, has its own interpretation. While people typically identify Big Data using the four Vs (Volume of data, Velocity of data, that is the frequency with which data comes in, Variety of data types as well as the Veracity or goodness of the data), one of the best definitions that I have heard of the term is as follows: “Big data is one byte more data than your system has”. For example, while HR data has a wide variety of data with very low veracity (since data is quite noisy), compared to streaming data coming in from the likes of e-commerce, the volume and velocity of data are low. However, given the lower computing power of typical HR systems, even a few gigabytes of data can seem like big data for its practitioners.

Thus “big data” itself is a relative term that I believe has outlived its usefulness. Perhaps “smart data” ought to replace “big data” for most analytical applications!

The writer is founder and CEO of HR analytics start-up, Factorial Analytical Sciences

LIVE COVERAGE

TRENDING NEWS TOPICS
More