“Big Data” has become a phrase that sets itself apart in the tech industry. Unlike a host of other technologies that rise to the spotlight as buzzwords, only to die in the internet’s disposal bin a few years later, big data is actually useful. It is used to make stock market predictions, detect fake news and prevent fraud by fintech companies.
Big data has become so useful that merely having your company associated with the term seemingly makes it more valuable.
With all the success stories we hear, is everything so clear cut? Do all businesses need to integrate big data into their business models if they want to survive the next great internet purge?
Behind the smoke and mirrors.
The sad truth about the big data trend, like so many others that manage to make it to the mainstream, is that a lot of the hype is misplaced. The first headwind companies run into is in the fact that data itself has no inherent value, instead, it’s given value by how it’s used.
Big data involves collecting a mixture of structured and unstructured data and finding a way to convert it into something useful. A lot of startups looking to join the hype around big data collect the data and adopt a platform, only to find they don’t have any real problem to solve. It becomes a classic case of putting the cart before the horse.
So, why is making use of big data such a difficult task? In a nutshell, it’s because running a data-centric organization isn’t about the size of the data. There’s no direct correlation between the amount of data you collect and your relative success in your field. You are more likely to succeed based on the usability of the data you collect, big or small.
Rather than obsess over how much data you have, it makes more sense to collect data you need and use that to make viable business decisions. Even small amounts of data can find use in generating insights that may have a considerable impact on your business in the future.
Logistics of running a big data platform.
Large companies such as Amazon and Google can look to big data in parts of their organization so that even if it fails, it doesn’t permanently maim their operations.
As you might imagine, startups and SMEs can’t afford such a luxury. Running a big data platform has some serious financial and operational implications a lot of businesses aren’t in a position to deal with.
To illustrate, consider two technologies that continue to generate a lot of headlines in the data science world today: Hadoop vs Spark. Hadoop is the original big data framework, and Spark is the quiet kid that suddenly hit puberty and suddenly everyone started to notice.
Hadoop is infamously difficult to set up – even a single cluster may require days’ worth of editing and tweaking before you finally get it right. This will end up costing the company money, time and human resources better spent elsewhere.
Spark rose to fame partly due to its abstraction of the complicated parts of setting up and maintaining Hadoop. No Hadoop-Spark configuration is strictly necessary, so a lot of developers end up dropping Hadoop altogether.
Spark runs most of its processes in-memory, granting it somewhere 100x the speed Hadoop offers. The problem with running Spark clusters is that it’s extremely expensive. RAM isn’t cheap.
Getting the right data for the job.
The above argument shouldn’t imply that big data in its present state is useless. Far from it. In the search for the right kind of data to maximize your company’s potential, the right data might be big, or it might be small. What gives you the competitive edge is finding the data that is so critical it places you in a better position than rivals.
You can decide whether or not you need big data in a relatively simple 3-step process:
Understand the problem you want to solve: The first step is to understand the problem that needs solving sufficiently. A proper understanding informs the kind of problem-solving approach you are going to apply: is it a big data problem or not?
A fundamental problem with big data is the way it was named. Rather than lay emphasis on the three parameters that define it – volume, variability and velocity – it pays too much attention to the foremost factor.
Just because you deal with a large amount of data doesn’t mean you are dealing with big data. And even then, you need to understand that it may not necessarily need a Hadoop or Spark approach.
For instance, if you have terabytes of information you store in a data warehouse that needs to querying ever so often, you don’t have a big data problem per se.
Just because your company has been around for a while and managed to amass a wealth of information doesn’t suddenly make you a bid data firm. It also depends on how complex the queries are (variability) and how quickly they come in (velocity).
Decide the kind of data you need: There are three separate factors that should be put in consideration when deciding the kind of data to collect:
- Who the target audience is.
- What their preferences are.
- What motivates them to use your platform.
Here, Uber is a perfect example of companies that amass a lot of data each day, but would realistically be fine either way. Their customers are people that want a quick ride, so personal information like location is necessary.
Customer preferences include where they want to be picked up and dropped, and they are motivated by convenience and lower prices.
Deliver relevant information: Ultimately, the value of your data depends on how you use it. A common way of using this data is delivering relevant information to the customer. Inspire your customers to keep using your platform by presenting them with information that shows you understand them.
This is usually employed based on physical location, personal interests and environmental influence. You don’t necessarily need big data to implement such a solution.