“Water, water everywhere… nor any drop to drink”. This is a line from The Rime of the Ancient Mariner by Samuel Taylor Coleridge. The speaker, a sailor on a becalmed ship, is surrounded by saltwater that he cannot drink. This could be used as a metaphor for big data. We are surrounded by it… and some might even say drowning in it, but to what purpose? Stay tuned as we define, explore, and characterize big data as we turn it into information AV integrators can use. Of course, it begins with understanding exactly what this thing called big data is.
There seems to be as many definitions for “big data” as there are businesses and individuals who want to benefit from it.
Overall, big data is the exponential increase and availability of various pieces of data throughout our world. Big data ultimately refers to extremely large data sets.
IBM maintains that businesses around the world generate nearly 2.5 quintillion bytes of data daily! Almost 90% of this global data has been produced in the last 2 years alone.
A more formal definition comes from the National Institute of Standards and Technology (NIST). They defined big data as consisting of “extensive datasets—primarily in the characteristics of volume, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.”
So now we know that big data is big… but where does it come from?
Big data seemingly comes from everywhere. It can be generated by everything we interact with and are connected to. It even can be created when we are out walking (or driving) and think we are not connected.
Can we all say ‘cameras and sensors’? It comes from business transactions, loyalty programs, customer databases, medical and government records, internet transactions and clicks, mobile applications, social networks, research repositories, and machine-generated data and real-time data sensors used in Internet of Things (IoT) connectivity to name a few.
The data may be left in its raw form or preprocessed using data mining tools or data preparation software, so it’s ready for analytics to make it usable.
Big data can be categorized in three basic types:
- Structured data is quantitative in nature and refers to a fixed format like a database or Excel spreadsheet. In this form it is easy to use, analyze, distribute, and repurpose as needed.
- Semi-structured data does not conform to relational databases such as Excel or SQL but contains some level of organization through semantic elements like tags. For instance, consider HTML, which does not restrict the amount of information you can collect in a document, but enforces a certain hierarchy or structure.
- Unstructured data is qualitative and lacks any specific form or structure. Email, social media, word processing, and video files are examples. The lack of structure makes it very difficult and time-consuming to process and analyze. Over 80% (and growing) of big data falls into this category.
Let’s explore the characteristics of big data. In 2001, industry analyst Doug Laney defined the characteristics of big data labeling them as the “Three Vs”.
They were volume, velocity, and variety:
- Volume: The unprecedented explosion of data gathering means that the digital universe will reach 180 zettabytes (180 followed by 21 zeroes) by 2025. 2.5 quintillion bytes of data are produced by humans every day. The challenge is not so much the amount but what to do with it.
- Velocity: Data is generated at an ever-accelerating pace. Every day, Google receives over 3.5 billion search queries. Globally, as of 2019, a staggering 293.6 billion emails were sent each day and there are currently now over 4 billion email users worldwide. The challenge for data scientists is to find efficient ways to collect, process, all this data for specific uses.
- Variety: Data comes in different forms as note previously. It might be structured, semi-structured, or increasingly unstructured.
As you can image a lot has happened since 2001 and the original Three Vs of big data.
Data scientists are adding several expanded characteristics:
- Veracity: This refers to the quality of the collected data. Think garbage in and garbage out. One expert noted, “As the world moves toward automated decision-making, where computers make choices instead of humans, it becomes imperative that organizations be able to trust the quality of the data”.
- Variability: Data’s meaning is constantly changing. One example is language processing by computers. It is complicated because words often have several meanings. Data scientists must account for this variability by creating sophisticated programs that understand context and meaning in all the variations possible.
- Visualization: Data must be understandable to nontechnical consumers of data. Visualization is the creation of complex graphs that tell the tale, “transforming the data into information, information into insight, insight into knowledge, and knowledge into advantage”.
- Vulnerability: This is all about security and protecting big data yet making it accessible to the appropriate person.
- Volatility: How long does big data need to be kept?
- Value: While considering the cost to collect and assess your big data, after addressing all the other characteristics you want to be sure your organization is getting value from it.
Companies can use the accumulation of big data in several ways. It can be used to improve operations, lower operating costs, and assess weakness and strengths in various areas.
In a sales environment, it can improve customer service or create personalized marketing campaigns based on specific customer preferences. No matter the application, it can lead to increased profitability.
Businesses that properly gather and utilize big data have a competitive advantage over those that do not since they’re able to make faster and more informed business decisions.
One McKinsey analyst noted, “Buried deep within this data are immense opportunities for organizations that have the talent and technology to transform their vast stores of data into actionable insight, improved decision making, and competitive advantage”.
No matter how you slice and dice your big data, one thing is sure: big data is here to stay, and it’s getting exponentially bigger day by day.
Every organization needs to understand and assess what big data means to them and what it can help them do. AV integrators need to facilitate those discussions with our clients to see where we fit. The possibilities and opportunities are truly endless.