The 3 V’s of Big Data and their Technologies
Share on Facebook
Share on Twitter
Share on Google+
Share on Reddit
Share on Email

The main challenge is that as things stand today, there is no single technology that can cope with all the characteristics of big data – volume, velocity and variety - all at once

The main challenge is that as things stand today, there is no single technology that can cope with all the characteristics of big data – volume, velocity and variety – all at once

shutterstock_124526134

 The guest post was written by Tzachi Lunet-Levi – Director at Amdocs

If you ask people what big data is, the boilerplate answer is usually “the 3 V’s” – volume, velocity and variety. They will then start discussing how big the data needs to be in order to be defined as “big data”.

201310-Big-Data-3V

And when you start looking at the actual technologies, it really starts to become complicated. The main challenge is that as things stand today, there is no single technology that can cope with all the characteristics of big data – volume, velocity and variety – all at once.

If I had to cluster big data technologies into large groups, I’d place them in the following approximate buckets:

  • Hadoop & Map Reduce – framework for data-intensive distributed applications running on commodity hardware
  • NoSQL – new breed of databases that don’t provide the same consistency models as traditional relational databases
  • In-memory Databases – databases relying mainly on main memory for their data store
  • Columnar Data Store – data stores arranged in columns instead of rows, enabling faster access for analytic applications
  • Stream Processing – real-time computation systems used to filter and analyze huge amounts of data and events super-fast

There’s more information here about these technologies, but at the end of the day, if we need to map them to the 3 V’s, we will get the following table:

201310-Big-Data-Technologies

These technologies are in constant flux, and some of them are actually a bucket of multiple different solutions – so there’s no need to nit-pick as to the table’s accuracy. It’s here just to give some indication of where each technology fits best.

What is interesting to see is how the Hadoop ecosystem is transitioning from solving the problem of volume to tackling velocity with Hadoop itself. Each Hadoop distribution vendor has its own homegrown approach to speeding up Hadoop. One such initiative is Hortonwork’s publicized attempt at increasing performance of Hadoop SQL querying by a factor of one hundred.

If there are two things to remember about big data, they are:

  1. Big Data is a moving target – The market of data processing is changing around us. Try to stay up to date with the trends and changes.
  2. There’s no silver bullet – As with anything else, there’s no “one-size-fits-all” solution. In fact, most enterprises end up using multiple big data technologies to solve their problems.

If you are looking for a solution to a big data problem, start by defining which of the big data V’s are causing you a headache and move on from there to select the set of technologies to use. If I had to guess, I’d say you’ll end up using more than one.

Picture creditBig Data/ Shutterstock

Share on:Share
Share on Facebook
Share on Twitter
Share on Google+
Share on Reddit
Share on Email
Guest Author

About Guest Author


We host guest bloggers who write about their area of expertise. If you want to publish a post at Geektime, please contact us via the Red mail icon that appears at the bottom toolbar of the page.

More Goodies From Industry


Socially-focused startups tackle rural Mexico’s energy problems

Endless Lima traffic spawns innovative startups

New concept: Booking meeting rooms at the heart of Tel Aviv by the hour

  • Anonymous

    I think it’s worth mentioning HPCC Systems to tackle the three Vs of big data, as well as Value. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems. It is a mature platform and provides for a data delivery engine together with a data transformation and linking system. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at http://hpccsystems.com