The key to success in the digital age is transforming your business model by leveraging data in a way that either improves your operations or adds value to your customers. Unlike money, much of the data that makes the world go round is free! Where can you get it and how could you use it?
David Hodgson, March 7, 2016
A year ago I posted on the idea that “data is everywhere”. Since then there is more data in the world, greater accessibility and better tools. Read that old post if you have time because it will still add something to what you learn here if you are a newcomer to Big Data Analytics.
Of course your company has been using data for years and doing analysis and reporting on it too. With an explosion of new analytical tools and the low cost of cloud based facilities and storage the world has discovered that the business data you have had in structured databases for years is just the tip of the iceberg. With so many recreational and business activities now being conducted on-line, the world is generating lots more data and this data, often very unstructured and fragmented, is becoming both accessible and useful.
This data might in fact be in-house, in system logs and Excel spreadsheets i.e. not readily available for large scale analysis. But it might be outside the company and a new asset that you could acquire and derive value from. It could be as crazy sounding as data from social media feeds (like Twitter or Facebook), or it could be more down to earth in terms of semi-structured lists. Most interesting is that some of it could be free!
To find some of this “new” data you might start by just googling “free data sources”. This will yield many references to follow up, but some of the richest data and most useful to business is provided by government agencies. The list below will lead you to a wide variety of examples.
Sources of Free Data from Government Agencies
But how do I use it?
Using data from these or other sources is a multi stage process. First you have to access the data, be able to bring it into your repository, maybe Hadoop. Then you must clean it up or format it so that it fits for your use (key value pairs, JSON, structured, semi-structured etc.). Lastly you have to create the analysis and and visualization that will give you the valuable insights you seek, often by pairing it with your existing structured data.
Accessing data could be a simple Extra, Transform & Load (ETL) process if you have access to the source database. Often though the data you want is only accessible through a defined Application Program Interface (API ). The Socrata link above details APIs for all of the thousands of data sources listed (e.g CA Prop 39 grants)
The API is the underpinning of most apps in the digital economy and the way apps interact or are integrated. Having a programming team that can use publicly available APIs to access data is an essential step for you. One of the hubs of information on APIs is the Programmable Web site. If you are building your analytics app on a hosted platform like AWS, Azure or Google they have databases of useful data that you can access for free through their APIs. For example see the list of AWS public datasets
But how could any of this be useful to me?
Firstly you need to educate yourself about the potential for Big Data analytics and the availability of data from numerous sources. You don’t know what you don’t know until you start to look at what is going on. I have heard stories of banks detecting ATM weaknesses in their competitors through social media feeds and using that knowledge to target new customers. Or the insurance company that gets a red flag on a fraudulent claim by spotting that the two opposing parties in a claim have been Facebook friends for years.
Once you have the general picture then it’s the questions you want to answer that are more important than finding data. Step back and think about questions that, if you could answer them, would give you some sort of operational advantage or competitive edge. Make sure that you know what you would do with the answers if you had them!! Then hire a good data scientist or two and start a first project to amaze yourself. They will find the data to answer the questions and work out how to do the analysis and visualization.
What can go wrong here?
Many things could go wrong, but I will highlight three:
1. You spend too much and go over budget
There’s no such thing as a free lunch right? and processing even free data is going to be time consuming and costly as you hire the right people, buy the right tools and experiment to get a useful result.
2. You under use the data and fail to get full value
The real value of your analytics will be realized as you allow business users to interact with the data in the context of making day-to-day decisions. Limiting access just to the data scientists who started the initiative will be a big mistake.
3. You forget the need for security & compliance
Probably the biggest thing that can go wrong, especially as you expand access to business users, is falling foul of privacy and compliance regulations. For Hadoop based systems you will need to leverage security systems like Ranger, Sentry and Knox to pass user information up to the application layer, and enforce access authorization based on data lineage.
Data really does make our digital world go round. If you haven’t yet found new data sources and new ways to use data to super charge your business, then you may be behind your competitors. Get going, find a project to start a “Big Data” initiative and start your own Digital Transformation.
Image credit: Unknown origin