Business

Hadoop is NOT “Big Data” is NOT Analytics

Perhaps “smart data” ought to replace “big data” for most analytical applications

Reported By:| Edited By: Arun Krishnan |Source: DNA |Updated: Mar 24, 2018, 05:08 AM IST

Hadoop is NOT “Big Data” is NOT Analytics

Arun Krishnan

I am amazed at the way the words “Hadoop”, “Big Data” and “Analytics” are bandied about in a very haphazard fashion these days. For those desirous of working in the field of Analytics (especially the very young but also some not so young), my earnest entreaty is to understand that these three words mean very different things. Using them interchangeably just demonstrates ignorance rather than expertise.

Perhaps, a bit of history would help to give some perspective. Folks in academia have been solving “big data” problems for a long time using the power of cluster and distributed computing to solve embarrasingly parallel problems. Before the advent of inexpensive “cloud-based” resources, universities and research organizations would build their own very large “super clusters” using either commodity off-the-shelf (COTS) components or if you went back even further, would use large, shared-memory computers (the likes of Silicon Graphics sold these). As research and some large industrial organisations started building “Beowulf” clusters, they started putting together operating system packages that made it easier for people to quickly set up their clusters. Of course people had to write distributed applications on them using specialised languages which could become quite involved.

The terms “Big Data” and “Hadoop” have gained favour in recent times. Hadoop, has made it fairly easy for programmers to take any embarrasingly parallel problem and quickly spread them across large clusters. Big Data on the other hand is to me just the fuel that Hadoop works on to convert it into a form amenable for analysis. A person who is able to write code using Hadoop and the associated frameworks is not necessarily someone who can understand the underlying patterns in that data and come up with actionable insights. That is what a data scientist is supposed to do. Again, data scientists might not be able to write the code to convert “Big Data” into “actionable” data. That's what a Hadoop practitioner does. These are very distinct job descriptions.

Big Data too, has its own interpretation. While people typically identify Big Data using the four Vs (Volume of data, Velocity of data, that is the frequency with which data comes in, Variety of data types as well as the Veracity or goodness of the data), one of the best definitions that I have heard of the term is as follows: “Big data is one byte more data than your system has”. For example, while HR data has a wide variety of data with very low veracity (since data is quite noisy), compared to streaming data coming in from the likes of e-commerce, the volume and velocity of data are low. However, given the lower computing power of typical HR systems, even a few gigabytes of data can seem like big data for its practitioners.

Thus “big data” itself is a relative term that I believe has outlived its usefulness. Perhaps “smart data” ought to replace “big data” for most analytical applications!

The writer is founder and CEO of HR analytics start-up, Factorial Analytical Sciences

LIVE COVERAGE

Speed Reads

DC vs GT IPL 2024 Dream11 prediction: Fantasy cricket tips for Delhi Capitals vs Gujarat Titans

Most Watched

IPL 2024: IPL Finals Likely To Be Held In Chennai, Will MS D

WPL 2024 Match 1, MIW vs DCW: Mumbai Indians Wins The Toss,

Sandeshkhali Row: Calcutta HC Orders Court-Monitored CBI Pro

Elvish Yadav Maxtern Fight: Elvish Yadav issues clarificatio

Lok Sabha Elections 2024: Kangana Ranaut's First Reaction To

Hadoop is NOT “Big Data” is NOT Analytics

Perhaps “smart data” ought to replace “big data” for most analytical applications

LIVE COVERAGE

TRENDING NEWS TOPICS

Popular Stories

C-Suite Calling: Actionable Insights for Sustainable Growth Powered by Digital Trust

Manushi Chhillar breaks silence on Bade Miyan Chote Miyan's box office failure: 'I don't think...'

Watch: Shane Bond tries to kiss Ex-MI skipper Rohit Sharma during IPL practice session, his reaction goes viral

The science behind Shilajit: How does it work on your body?

Hanuman Jayanti 2024: Date, history, significance, shubh muhurat, mantra and celebration of the festival

Most Viewed

Streaming This Week: Crakk, Ti...

From Salman Khan to Shah Rukh ...

Remember Abhishek Sharma? Hrit...

Remember Ali Haji? Aamir Khan,...

Remember Sana Saeed? SRK's dau...

Speed Reads

DC vs GT IPL 2024 Dream11 prediction: Fantasy cricket tips for Delhi Capitals vs Gujarat Titans

IPL 2024: How can MI still qualify for playoffs after 9-wicket loss against RR?

Jodhpur Rajasthan Lok Sabha Election 2024: Check important dates, key candidates, past result and more

Chittorgarh constituency Lok Sabha elections: Know polling date, candidates and past results

Delhi excise policy case: Court extends judicial custody of CM Arvind Kejriwal, BRS leader K Kavitha by 14 days

Most Watched

DNA Originals

DNA TV Show: Analysis of child traffickers' modus operandi in Delhi

DNA Exclusive: Ugly 'Car-Nama' exposed! Showrooms charge extra for delivering cars of high waiting period

DNA Exclusive: India's first Twitter user on her 16-year Twitter journey and Elon Musk's 'adventures'

DNA Exclusive: Ashok Gehlot to ‘lead’ Congress? Are Gandhis trying to hit three birds with one stone?

DNA Exclusive: As Gandhis remain 'reluctant', should Congress finally get a president from outside the family?