What is Structured and Unstructured Data ?
In the previous post, Bantu Tech introduced Big Data and two key terms were mentioned, Structured and Unstructured Data. For the most part, structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite.
Structured Data is information which is highly organized, such that inclusion and implementation into a relational database is seamless and easily incorporated. Data contained in relational databases or spreadsheets would be classified as structured data. Benefits of structured data are that it can be easily processed, manipulated, queried and analysed The standard process of managing structured data is using SQL (Structured Query Language) described by WebOpedia as “a programming language created for managing and querying data in relational database management systems.” A prime example of use of structured data is the retrieval of a person’s name from the field “name” in a database. Because there is structure, it is relatively simple to access required information.
Structured data is taking on a new role in the world of big data. IOT (Internet of Things) amongst other technological evolutions are producing new sources of structured data usually in real time and in very large volumes. Data can be divided into two categories:
- Computer – or machine generated: This refers to data that is generated without human intervention.
- Human-generated: This is data inputted and generated by human interaction.
Unstructured Data is the opposite of structured data and can be characterized as information that doesn’t reside in a traditional database. Files such as multimedia content, emails, videos, presentations and web pages can be classed as unstructured. Emails whilst often organized in terms of date and time, are unstructured as they are not sorted by content type, subject type or sender. Whilst internally they are structured, as they cannot fit seamlessly into a database, they are ruled as “unstructured”. According to WebOpedia “Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the amount in enterprises is growing significantly – often many times faster than structured databases are growing.”
When a business is looking at big data for situational awareness and to add value to the organisation, three components are utilised together:
· Data Warehousing Technology
· Business Intelligence (BI)
· Data Science
Data Warehousing Technology
With the modern digitization of information, the volume of data available to an organisation has increased with some businesses having information stores at a petabyte scale. Data Warehousing at massive scale faces the challenge of finding a way to action all the data in a proficient and economical way. The data must be processed with efficiency that will make it actionable so distribution frameworks such as Hadoop/MapReduce are utilised.
Business Intelligence (BI)
The capability of incorporating big data into an organisation to make business decisions is called BI (Business Intelligence). It is described by Frank Lo as “an important link between the data warehouse and business leaders/business analysts, enabling full transparency in the nuance of what is going on in the business. www.Gartner.com describes BI as “an umbrella term that includes the applications, infrastructure and tools, and best practises of information to improve and optimize decisions and performance.”
The team responsible for Business Intelligence usually maintain tools that relay the data to end users through the following mediums:
- OLAP (Online Analytical Processing) tools
- Real-time measurements
Data Science involves predictive modelling, casual interference and pattern-recognition to learn more from data. It involves data mining to solve analytical problems around the business and data. The end goal of data science is described by Frank Lo as “providing value through discovery by turning information into gold.”