The following, taken from a whitepaper titled “Structuring of Unstructured Data for Text Analytics” describes the challenges of unstructured data in a clear and simple way.
Structured data formats are usually record-oriented.
Transactions are stored as per predefined data models which make them easy to query, analyse, and integrate with other structured data sources. Structured data models are in the form of tables and have relationships among models. It is easy to create reports out of these tables. However, unstructured data is contrary to structured sources because free-form text makes it more difficult to query, search, and extract. It also complicates integration with other data sources.
Sources of unstructured information that are of interest to enterprises include emails, news feeds and blog articles; contact-center notes and transcripts; surveys, feedback forms and warranty claims; and every kind of corporate document imaginable. These days, companies strongly rely on relational data or transactional data for decision making or business analysis. Data that is coming
from unstructured sources are not analysed, leading to business risk.
Text mining has been one of the techniques to find relationships in textual data. Since analysts want facts and answers to questions, textual sources should be tamed as per structured sources.
It is very important to enable decision making support for unstructured data by finding relations and by providing knowledge. Text analytics is the answer to the ‘Unstructured data Challenge’.
Source: Shilpa B.L., Harish G., “Structuring of Unstructured Data for Text Analytics”, 2003 (paper)
- Unlocking Opportunities in Messy Data (forbes.com)
- Unstructured data is worth the effort when you’ve got the right tools (radar.oreilly.com)
- Embracing the chaos of data (radar.oreilly.com)
- Big Data Demands New Skills (connectedexperiences.me)