The main challenge is that as things stand today, there is no single technology that can cope with all the characteristics of big data – volume, velocity and variety – all at once
The guest post was written by Tzachi Lunet-Levi – Director at Amdocs
If you ask people what big data is, the boilerplate answer is usually “the 3 V’s” – volume, velocity and variety. They will then start discussing how big the data needs to be in order to be defined as “big data”.
And when you start looking at the actual technologies, it really starts to become complicated. The main challenge is that as things stand today, there is no single technology that can cope with all the characteristics of big data – volume, velocity and variety – all at once.
If I had to cluster big data technologies into large groups, I’d place them in the following approximate buckets:
- Hadoop & Map Reduce – framework for data-intensive distributed applications running on commodity hardware
- NoSQL – new breed of databases that don’t provide the same consistency models as traditional relational databases
- In-memory Databases – databases relying mainly on main memory for their data store
- Columnar Data Store – data stores arranged in columns instead of rows, enabling faster access for analytic applications
- Stream Processing – real-time computation systems used to filter and analyze huge amounts of data and events super-fast
There’s more information here about these technologies, but at the end of the day, if we need to map them to the 3 V’s, we will get the following table:
These technologies are in constant flux, and some of them are actually a bucket of multiple different solutions – so there’s no need to nit-pick as to the table’s accuracy. It’s here just to give some indication of where each technology fits best.
What is interesting to see is how the Hadoop ecosystem is transitioning from solving the problem of volume to tackling velocity with Hadoop itself. Each Hadoop distribution vendor has its own homegrown approach to speeding up Hadoop. One such initiative is Hortonwork’s publicized attempt at increasing performance of Hadoop SQL querying by a factor of one hundred.
If there are two things to remember about big data, they are:
- Big Data is a moving target – The market of data processing is changing around us. Try to stay up to date with the trends and changes.
- There’s no silver bullet – As with anything else, there’s no “one-size-fits-all” solution. In fact, most enterprises end up using multiple big data technologies to solve their problems.
If you are looking for a solution to a big data problem, start by defining which of the big data V’s are causing you a headache and move on from there to select the set of technologies to use. If I had to guess, I’d say you’ll end up using more than one.
Picture credit: Big Data/ Shutterstock