I follow up my previous post with questions that you should be asking yourself when it comes to getting more out of the data your organization probably isn’t using – unstructured data.
David Hodgson, March 3, 2015
Where is unstructured data? As they said about 60s radio series character Chickenman, “He’s everywhere!”
It’s inside your organization under your nose, and outside your organization ripe for the picking like low hanging fruit and in strange places needing a degree of pre-processing and parsing.
In my last post I talked about the power of combining structured and un-structured data to unlock the business value realized by recent revolutions in data analytic technologies. But what is unstructured data? If 80 to 90 percent of the data that you have today is unstructured what is it? Where is it? How can it be used? And how can you get more?
Given that 80 percent of the valuable business data that most companies use today is structured data, this means they’re not getting business value from the majority of the data available to them right now.
You must accept this challenge: the winners in the application economy will be those that find business value in unstructured data and use it in combination with the structured data that undergird their existing mission critical business systems.
Being the Biggest loser is not a positive statement in the world of Big Data!
The big picture: what is it and where is it?
What is it? Any source of information that doesn’t have a defined format or structure intended for generalized data processing as rows and columns is probably unstructured or semi-structured data and could be valuable to you, including a report, a log, an image, a form or any sort of document or file.
Even Excel files that are visually organized in rows and columns are considered semi-structured data for the purposes of this discussion – only the Excel application knows how to do anything with it. In the big picture, this isn’t very useful.
Inside your organization think of where the most valuable, prescient data really is: could it be in notes people take, emails that people exchange, Excel spreadsheets they create, logs of their activity, CRM records or social media interactions with customers?
Sure the reference data in your business systems is critical, but the data that is driving daily business decisions and longer-term strategic decisions may be elsewhere. Could you access that? Would it be valuable if you could?
Outside your organization, where is the data that describes your adjacent markets, or the next innovation that you will (or should) either create or capitalize on? Where is the low hanging fruit? Only you know really but could it be on news websites, in discussions on social media, stock price reports or in SEC filings on company websites? Could it be on a government website that lists foreclosures, or competitive bids, in weather reports or news reports? It could be anywhere accessible via the Internet.
Sure your employees could read all this stuff and process it mentally to your advantage, but can they really and do they? How could you get hold of it in an automated way if it was valuable? This is the low hanging fruit usually available through published APIs, for purchase, or collectable using simple and free open source tools.
The big secret: how do I get hold of it and where do I keep it?
Unstructured data usually requires new tools and processes to extract intelligence and deliver business value. Absent of structure you need ways to extract or create context and metadata about the data: what is it about, when was it created, when and by whom.
For the goal and purpose we are discussing this metadata cannot be created manually. To be useful, these processes need to be scalable and in real-time. They also need to be relevant to business and they can’t cost more than their derived value.
Enter the magic of open source tools and commodity processing power either on-premise or in the cloud. Without these ingredients you would not be able to get hold of Big Data or store it in a cost effective way.
Forget your conventional data warehouses – while they’re not going away, they’re also not your go-forwards tools. In the age of the Internet of Things (IoT), you will be looking at one or several of the new file systems and so-called NoSQL databases that are available today.
The table below gives some idea of the popular offerings available and what you might use them for. What is it you want to do?
How can I be the ‘biggest winner’?
You can’t be the ‘biggest winner’ without asking the right questions and finding the answers. All the questions above, and the specific questions for your business will help you uncover your own secret sauce. Will you mine data others have collected, or create a new collection for yourself?
Take a look at what other companies have done:
Twitter has got millions of people to enter their thoughts on every subject under the sun. LinkedIn has got people to enter their career summaries and their contacts. Nike found personal health data. Some clever electronic medical records vendors have found drug usage data. Every web-commerce site is a potential source of profiling data and geospatial data.
To ensure you don’t get left behind, ask the questions, get engaged with the potential for your business and carve out your winning, differentiated position in the application economy.
After all, the answers are everywhere.