CONF2016: Leveraging the DNA of Digital Transformation


This was my first time attending a Splunk .Conf so I was eager to feel the event; to gauge the excitement about the products and get a sense for how Splunk might succeed with its ambitious plans for growth in an ever more competitive market.

LittleBlogAuthorGraphic  David Hodgson, September 30, 2016

The family goes to Orlando

Boasting 3 days of in-depth training, 185 technical sessions, inspiring keynotes each day, and the booths of 70 technical partners, .Conf2016 did not disappoint the nearly 5,000 attendees in terms of the intensity of the event and the velocity of interactions between us all. Obviously CEO Doug Merritt had primed his troops because in the kickoff keynote he quoted what they say internally at Splunk: “If you ever want to be inspired go out and talk to a customer”.

The fervor that the Splunk user base feels for the product brings back memories of VMware and SAP when they were cool, and promised change and progress. Perhaps because it’s a weird election cycle, Millennials are looking to technology rather than politics to shape their future. I don’t know for sure, but definitely this conference felt like the best sort of family gathering where people actually liked each other, wanted to collaborate on building solutions and wanted to bring new members into the fold.

The conference was held in the Dolphin & Swan at Disneyworld, Orlando.  By the time we got to the Tuesday event night and a roaming party around the Hollywood studios park, the atmosphere was very much of one big family having fun together.

Learning how to get machines learning IT

Splunk is the clear market leader in providing a pragmatic platform for machine learning. The results have been real and beneficial whether its detecting intrusions from unusual data access patterns or predicting trends that can be addressed to optimize IT service delivery. A big theme at .Conf2016 was the power of Machine Learning and how it is shaping Splunk’s products.

In practice Machine Learning is very different from what we usually think of as Artificial Intelligence.  AI seeks to build computer models that can emulate the functions of human brains. We expect that an AI would perceive its environment and exhibit goal seeking, purposefully behavior that is understood by humans. Ideally it would interact with humans to both receive input and augment our decision making abilities. By contrast Machine Learning is a sub-area of AI that is focused on pattern recognition that allows the system to “learn” and predict based on history, but without their being a rational explanation for that response that a human could understand.  Machine Learning relies on the consumption of masses of granular data that can be processed with statistical analysis to make predictions and uncover “hidden insights” about relationships and trends.  These “insights” are not necessarily causalities that have an explanation that humans could understand and replicate.

As a solution Splunk differentiates itself from the similar platforms like the ELK stack (Elastic Search, Logstash, Kibana) and Hadoop mainly through its functional completeness and ease of use. But it is proprietary and somewhat expensive to use with costs scaling based on the amount of data ingested daily. To accommodate customers concerns about growing costs and their desire to embrace open source technologies, Merritt announced at Conf2016 that Splunk Labs was enabling integration with Elastic Search, Spark, and Kafka- showing Splunk’s openness to adaptation to what customers are asking for in the field. The announcement was well received and is probably the answer both to customer needs and to Splunk can ensure continued popularity.

From a Syncsort perspective our Ironstream product has been focused on getting data to Splunk directly but customers have increasingly asked us to support a Kafka pipe to split data between Splunk and Hadoop. With Splunk’s new open architecture announced at .CONF2016 we will now plan to follow suit.

Splunking IT Operations

One of the significant areas of success that Splunk has had is in the area of monitoring tools for IT infrastructure. The normal users are Enterprise IT teams that need to monitor a broad array of platforms. They need to contextualize events by gathering data from connected platforms and using Splunk to do basic time-based correlation and advanced pattern recognitions. The rate of environmental change in hardware, software and connected devices makes traditional tools almost impossible to integrate and Splunk Enterprise offers a much simpler and more effective approach

For the last two years Syncsort has partnered with Splunk to add the mainframe platform to those monitored and this has proven to be an essential ingredient for the some of the world’s biggest IT organizations that have mainframes.

On the first day Merritt introduced the concept of Data as the DNA of IT, driving evolution and change. On Wednesday Andi Mann carried the theme further in his keynote “Re-Imagining IT” saying

“Digital transformation needs to be in your DNA; not passionately pursuing it is an existential challenge and threat to your individual and organization’s future success”.

Mann focused his discussion on the new 2.4 release of IT Service Intelligence (ITSI) that was unveiled at the conference. The main new capabilities of value are:

  • Anomaly detection using machine learning
  • Adaptive thresholds and tells you what the norms and thresholds should be for any time of the day, week, etc.
  • Intelligent events with contextualized data wrapped in them
  • End-to-end visibility of business services richly visualized for LOBs in the new “glass tables”

At .conf2016 Andi Mann discussed Syncsort’s role in making Big Iron Data available to Splunk for Big Data analytics

Syncsort also unveiled our latest work which was integration of mainframe data for ITSI 2.4.  We demonstrated this with glass tables visualizing an online banking system from a mobile device to a mainframe running CICS and DB2. The Syncsort ITSI module is available for download from Splunkbase at no cost.

Splunking Security

One of the most widely adopted use cases for Splunk is security and compliance. As normal you can roll your own very effectively using the Splunk Enterprise platform or you can add pre-built power features with Splunk’s premium app Enterprise Security (ES)

In her keynote Haiyan Song, SVP Security Markets described how alert based security is no longer adequate and stated that Machine Learning is now required to address internal and external threats. Splunk’s answer is User Behavioral Analytics or UBA.

At the conference Splunk announced new features in ES 4.5 and UBA 3.0 that were aimed at providing CISOs and their teams with operational intelligence. The highlights were:

  • The Adaptive Response initiative allowing partners to openly integrate SIEM technology
  • Glass tables available for advanced visualizations of the underlying data
  • Enterprise hardening for the Caspida acquisition to create UBA as a product

Song described how UBA has the ability to understand and correlate user sessions across platforms and devices. She also brought on Richard Stone from the UK Ministry of Defence who explained how they are leveraging Splunk ES and UBA to create a DaaP (Defence as a Platform) ecosystem. To Stone this is a single information environment in which anyone with the appropriate credentials can access it from any point, enter a familiar environment, and access any information. He challenged us to “Date to Imagine” saying that the biggest constraint in security is our imagination.

Syncsort again extends these solution to the mainframe offering data integration to ES for RACF via the Ironstream product.

Splunking DevOps

A new concept unveiled at .Conf2016 is a solution for DevOps. This is perhaps not surprising given Andi Mann’s background and he will be the champion for this new product. The solution uses the underlying capabilities of Splunk Enterprise to take a data-integration approach to deliver three areas of value:

  • End-to-end visibility across every component in the DevOps tool chain
  • Metrics in glass tables to show LOBs that code meets quality SLAs
  • Correlation of business metrics with code changes to drive continual improvement

Splunking the Mainframe

One of the greatest things for me about the show was the number of people interested in the Syncsort booth. Even people who were not familiar with mainframes were interested to learn how we are Splunking the Mainframe!

Our CEO Josh Rogers delivered a phenomenal Cube interview that explained our strategy of moving data from Big Iron to Big Data (BIBD) platforms. Our deliverables and direction resonate with customers and prospects alike who are as excited with what we are doing as they are about Splunk!

During his appearance on the CUBE at .conf2016, Syncsort CEO Josh Rogers defined the Big Iron to Big Data (BIBD) challenge where customers need to take core data assets being created thru transactional workloads on mainframe and move them to next generation environments for analytics.

With the pace that things are moving across this market I am looking forward to .returning to .Conf in 2017 when it will be held in Washington DC, my home town. I know that both Splunk and Syncsort will have learned more and developed more, inspired by our customers. I can’t wait to see what we will have co-created and what evolves next from the data-DNA of IT.

A Dream of Great Big Data Riches – Harvesting Mainframe Log Data


In today’s new world of big data analytics, traditional enterprise companies have jewels hidden within their walls, embedded in legacy systems. Among the most precious stones, but perhaps some of the best hidden, are the various forms of mainframe log data.

LittleBlogAuthorGraphic  David Hodgson, June 20, 2016

Z/OS system components, subsystems, applications and management tools continually issue messages, alerts and status and completion data, and write them to log files or make streams available via APIs. We are talking hundreds of thousands of data items every day, much more from big systems. This “log data” generally comes under the heading of unstructured or semi-structured data and has not traditionally been seen as a resource of great value. In some cases it is archived for later manual research if required, in many cases it just disappears!  In the case of SMF records it has traditionally been consumed by expensive mainframe based reporting products that unlock the value, but at great cost and you still need special expertise to do anything with it.

What if all this potentially valuable data could be collected painlessly in real-time, made usable by a simple query language and presented in easy to read visualizations for use by operational teams? This sounds like a fantasy dream, but it is what Syncsort and Splunk have achieved through their partnership and products.

Nuggets and gemstones

Of all the data sources we are talking about, SMF (System Management Facility) records are the wealthiest trove with over 150 different record types that can be collected. SMF provides valuable security and compliance data that can be used for intrusion detection, tracking of account usage, data movement tracking and data access pattern analysis. SMF also provides an abundance of availability and performance data for the z/OS operating system, applications, web servers, DB2, CICS, Websphere and the MQ sub-system.

But there is much additional information in feeds like SYSLOG, RMF (Resource Management Facility) and Log4J. And there are the more open ended sources that could be considered log data, like the SYSOUT reports from batch jobs.

The gem collector and now Lapidarist too

Syncsort’s solution for the collection of mainframe log data is called Ironstream and it is a super-efficient pipeline to get data into Splunk Enterprise or Splunk Cloud. Designed from the start to be lightweight with minimum CPU overhead, Ironstream is a data forwarder that converts log data into JSON field/value pairs for easy ingestion. We built it in direct response to Splunk customers who wanted to complete their Enterprise IT picture with critical mainframe data to complete an end-to-end, 3600 view.



In addition to all the data sources listed above, Ironstream offers access to any sequential file and USS files. This gives very comprehensive coverage to any source of log data that an organization might be producing from an application. But in addition we offer an Ironstream API that can be used by any application to send data directly to Splunk if it’s not already writing it out somewhere.

Of course something has to be too good to be true here doesn’t it?   Well yes, one potential issue is the sheer volume of data that is available and the cost of storing it. While all of it could be valuable, most companies are going to want to selectively focus on the items that are most valuable to them now. To address this requirement, our Ironstream engineers became digital Lapidarists.  In the non-digital world, Lapidarists are expert artisans, who refine precious gemstones into wearable works of art. With the latest release of Ironstream, we now offer a filtering facility that allows you to refine large the large volumes of mainframe data by selecting individual fields from records, discarding the rest. By customer request, we have on our roadmap an even more powerful “WHERE” select clause that will allow you to select data elements across records based upon subject or content.

Why didn’t I know this?

There is a fast moving disruption happening in the world of IT management and not everyone wants you to know it. Open source solutions and new analytical tools are changing everything.

For the last 40 years complex, point-management tools have been used by highly skilled mainframe personnel to keep mainframes running efficiently.  Critical status messages are intercepted on their way to SYSLOG and trigger automation to assist the operational staff.  All this infrastructure has made most of this log data unnecessary for operations and mainly of archival interest if of any interest at all. The most valuable SMF data usable for capacity planning, chargeback and other use cases has been kept in expensive mainframe databases and processed by expensive reporting tools.

In parallel to the disruption that is being driven by emerging technologies there is a special skill crisis in the mainframe world; the experts that have been managing these systems for 40-50 years are retiring and there are not enough people being trained to replace them.

Fortunately in the confluence of these two trends a solution is born. By leveraging this new ability to process mainframe log data in platforms like Splunk and Hadoop, a new generation of IT workers can assist “Mainframe IT” by proactively seeing problems emerge and assisting in their resolution. In the first wave of adoption this will help offset the reduced availability of mainframe skills, but it won’t obviate the need for them completely and it won’t replace the old point management tools. Yet.

As this technology matures, and machine learning solutions become proven and trusted, we will see emerge a new generation of tools.  Based on deep learning, these will replace both the old mainframe tools and the personnel who used them, but now want to be left in peace by the lake.  My prediction is that as this comes to be a reality, we will also see a move of analytics technology back onto the mainframe platform.  The old dream of “autonomic computing” will become a reality and a new mainframe will in effect evolve; one that tunes and self-heals itself.

Find the Treasure!

Syncsort plans to be there, in fact we are leading the way there. We offer the keys to the treasure chest for anyone who wants to follow our map to find the dream of great riches!


New Beginnings on Old Bedrock: Linking Mainframe to Big Data

syncsort blog

Following 14 years at CA Technologies, where I held various senior management positions, I joined Syncsort in April of this year. I wanted to become a part of the leading company that is linking Big Iron to Big Data. What will that union yield for the industry, and for me?

 LittleBlogAuthorGraphic David Hodgson, May 23, 2016

I really enjoyed working at CA and learned a lot over the years there. CA is a vibrant, energetic place to work. The employees are smart, the products are good, the installed customer base is amazing and a lot of innovation is occurring. Yes, on the mainframe side of the house too. Last year alone saw three entirely new mainframe products launched, and I am proud to have been a part of the team that did that.

Syncsort is an incredibly interesting company that I had been watching for a while. A forty year old mainframe company that is doing some of the most valuable innovation in the big data space for large enterprises. A few years ago, the company re-invented itself as the company to move mainframe data to analytics environments. Strategic partnerships with Hortonworks, Cloudera, MapR, Dell and Splunk, along with some great innovation by the development teams, has transformed Syncsort into a player in Big Data ecosystem. In fact Syncsort announced record 2015 results, including the promotion of Josh Rogers to CEO to lead the company forward to fully realize the vision and potential that we have for the next few years.

In my last few years at CA I was very focused on the Big Data space and was interested in the problems that CA could solve there. When Syncsort founder and previous Mainframe GM, Harvey Tessler decided he wanted to retire, I talked to Josh and the rest of the Syncsort management team and we all agreed that I would be a great fit to take over the reins.

A few weeks into the role here I am thrilled with the decision to join. I love being part of a smaller company again where everything is more agile, just because of the small teams, shared mission and sense of urgency. We can do so much at Syncsort from our position of strength on the mainframe and our expertise in data management.

Having now met with several customers, I have confirmed the pattern of needs that we can address.  Big Data platform ITOA solutions and business analytics are now the norm. Although the market is evolving quickly and requirements are changing, everyone is doing it. Those who still think it’s still just talk are missing out big time. Most of these initiatives are not started by Mainframe IT, but in companies with mainframes, the enterprise teams are now at the point of implementation where they realize that they need the mainframe data for an effective or complete solution.

The broad uses cases that we see include things like real time monitoring of infrastructure or business services, and real-time awareness of access activity to help spot breaches in security or compliance. What these cases, and others, have in common is a deeper contextual analysis that is impossible with traditional, point monitoring tools.  Done right these solutions can be more effective than current practices and reduce cost by saving labor, penalties and software costs.

These same customers currently indicate that they are unlikely to dump the traditional management tools, but I actually wonder about that myself. As practices in data gathering and machine learning mature I think we will quickly see the start of next-gen automation that may make the old tools redundant. In the case of the mainframe this may become a necessity when, as an industry, we lose the skills of the baby boom generation and fail to replace the depth of knowledge they have.

By joining Syncsort I have brought myself to the coal-face, where we are mining the black-stuff out of the Big Iron legacy systems. As one of those whose career has been based on the strength of the mainframe, and its continual re-invention, I hope that I can be a part of the next round of evolutionary changes. Changes that will enable the mainframe to serve the industry for a renewed lease of life. New beginnings on old bedrock. The decade of ITOA and the dawning of AI applied to business systems.

Lessons from autonomous cars could drive your business


Among the more dramatic aspects of the Digital Transformation most surely be the prospect of driverless cars. It truly feels like science fiction becoming reality. It is useful to examine this phenomena as it’s emerging to see the lessons about AI that we can learn and apply to business transformations elsewhere.

LittleBlogAuthorGraphic  David Hodgson, April 10, 2016

As the prospect of the mass deployment of driverless cars comes careening towards us, are there lessons about AI that you could learn to get a competitive edge for your business? Like many other applications of AI, autonomous vehicles have taken longer to arrive than expected, and longer than predicted even a few years ago. But make no mistake, the robots are coming, and all that we have imagined about AI will probably pale into comparison with the reality we will experience when it’s widely adopted.

It seems like every car manufacturer now has plans to introduce driverless vehicles and there is continued growth in the use of software to transform the experience. Making the in-car experience a connected one is a natural extension of our lives now, and a recent Mckinsey report showed that consumers are growing in their willingness to pay more for this.

What could be

The developments and imminent delivery of highly connected, self-driving cars is very exciting to those who love technology and I am really looking forward to buying one! However, what is perhaps more interesting are the wide-ranging follow on developments and ramifications that go way broader than our immediate riding experience.

Traffic lights will eventually disappear, as will driving licenses, both being redundant. Denser parking will mean more available space, as computers become adapt at squeezing vehicles in and organizing them for access.  There will be no more tickets, traffic courts and a reduced police presence on roads. Traffic policies for special events can be piped to cars in the area programmatically and dynamically adjusted, reducing or eliminating frustrating backups.

Signposts will no longer needed although of course map data will be more important, but the cars could be the cartographers. Similarly the cars could monitor the state of repair  of roads and dispatch an autonomous truck with a mending crew of swarm bots and supplies to fix potholes. Damage from road usage might be reduced through coordination to vary the precise paths by slight offsets for more even surface wear.

Certainly there will be improved flow and throughput through reduced “noise” and disruptive movement. Connectedness could create convoys of cars going to similar destinations with individuals peeling off and joining as needed.

With reduced accidents, hospital and ER space will be freed up and there will be massive disruption to the insurance industry meaning it will probably be at the forefront of resisting adoption!  For sure there will be massive disruption to jobs wherever it is discovered that AI based robotic systems replace humans. One of the first might be the trucking industry where driverless transport convoys are already being tested in Europe.

The general point here is that the AI required to drive cars, drives a much broader impact to business and society than the specific solution area itself. The same thing will be true for changes that AI makes to your business.

The bigger picture and what it means to you

Seen in this bigger context, autonomous cars become an instructive use case of the disruptive influence of AI on the business processes for an industry and connected markets. Whether or not your business will be impacted by driverless cares, you should get ahead of how AI can be leveraged in your industry. It might be a wave that you can ride to survive and surf over your dying competitors that ignore it.

The term “cognitive business” describes a company instrumented with systems that understand data and can realize new insights on their own. This is not as far fetched as it sounds and we can see the dawning of the possibilities with IBM’s offerings that open up Watson as a cloud service through APIs.

In this scenario computers do the more significant things faster, better and more reliably than people. Not just math and report creation, as they have done traditionally, but “thinking”, predicting and decision making.  Eventually it leads to software that maintains and modifies its algorithms to better solve problems and solve new problems.

Imagine a cognitive supply chain that can quickly adapt to real-time changes in demand, differentiate between local and national trends, and accurately predict  the impact of upcoming events. Both know like social, sporting and weather events but also hidden pattern events perhaps created by competitor activity, or changes in consumer preferences. It could balance activity between on-line and bricks and mortar store fronts. It could optimize manufacturing, distribution and stock levels. And of course, given our theme, it could interact with fleets of autonomous distribution and delivery vehicles.

To achieve this and other scenarios, the AI will be integrated with huge amounts of “Big Data” but will also leverage human knowledge. Some of the most powerful solutions will be the interactions of experts with AI systems.  We have seen this already in the advanced weaponry of fighter planes and drone systems.  The medical world holds great promise for new solutions that combine expertise in this way too. There is no reason to think that advanced business systems will not be implemented in the same way.

You control your future

All this is future right now, but the sooner you get started in preparing yourself, the more likely you are to be a winner. This means experimenting with advanced analytics now, finding new uses for existing data and discovering new sources of data. And while you do that, simultaneously starting to grasp the security and compliance aspects of gathering and processing all this existing and new data in new ways.

The best time to plant a tree was 20 years ago, but assuming that your strategy planning has not been that prescient, then there is no time like the present to start planning for the future.


Image credit: CNN

Can AlphaGo Help You Stay Alpha Dog?


The recent triumph of AI program AlphaGo playing against a human, signals just how far advanced analytics has come. What lessons can you learn to get a competitive edge for your business?

LittleBlogAuthorGraphic  David Hodgson, March 15, 2016

Almost two decades ago, in 1997, IBM’s Deep Blue chess playing computer beat the reigning world champion, Garry Kasparov, in a six game match under tournament conditions. The world realized, perhaps for the first time, that HAL of “2001 A Space Odyssey” fame, was going to arrive at some point, though a few years later than cast by Kubrick.

Then, in 2011, IBM’s Watson computer stunned us by winning at Jeopardy. If you haven’t actually seen Watson playing Jeopardy click on that YouTube link; it’s truly awesome. The feeling of invasion is greater seeing Watson, perhaps because we can all imagine playing Jeopardy, and the question and answer approach is so “human”.

Which brings us to current events. Google’s DeepMind research team has developed AlphaGo which beat Fan Hui 5-0 last October. Hui is the current European Go champion and 2 dan master. This was impressive enough, but today saw AlphaGo win 4-1 playing Lee Sodol, the current World Champion, a South Korean 9 dan Grandmaster.   Send Lee Sodol a message of support somehow, because being on the coalface of human defeat by computers must be tough.

What is happening here?

Closed system games like Chess and Go are complicated, but have simple rules and a known, although massive, number of variables. There are more possible Go board move sequences than the estimated 1080 atoms in the visible universe. This is a formidable problem, but a different sort to the open ended question and answer format of a game like Jeopardy

AlphaGo’s algorithms use a combination of value-weighted, Monte-Carlo tree search techniques and a neural network implementation. The DeepMind team’s approach to machine learning involved extensive training from both human and computer play. AlphaGo played itself to rapidly learn the outcomes of numerous different options.

Watson used Hadoop to store masses of unstructured information, including the entire text of Wikipedia, that it could search with analytical techniques in real time. Equally significant in Watson’s case is that it was responding to natural language questions that it had first to understand using similar search techniques.

A third powerhouse for change, Facebook is also experimenting with AI systems and has their own Go-playing system Darkforest, also based on combining machine learning and tree search techniques.

Between them and the numerous other AI projects underway in different domains, we have the building blocks for HAL’s arrival.

So what?

I hear some saying “So what David?”. “This is interesting to learn about, and with the election I had missed it in the news, but of what importance is it to me?”

DeepMind is targeting smartphone assistants, healthcare, and robotics as the practical outcome for their experimental work with AlphaGo. From their website:

“The algorithms we build are capable of learning for themselves directly from raw experience or data, and are general in that they can perform well across a wide variety of tasks straight out of the box.”

IBM has already applied versions of Watson to practical problems, offers it as a service for anyone to buy and a developer community to encourage experimentation. An example of a practical application is the partnership with Sloan Kettering to fine tune cancer treatment. Similarly DeepMind is partnering with the UK’s National Health Service to improve its services.

Although for specific solutions much secret sauce is often preserved, the framework of these systems is usually Open-Source software. An important component of Watson is the Apache Unstructured Information Management Architecture (UIMA) software. These same tools and techniques will be what disrupt your business soon and you will want to be an early adopter.

Fed with the right data, a Watson-type system could answer new questions that nobody yet knows the answers too. Or applied to real-world problems an AlphaGo-type system could decided on the best course of action given many variables and alternatives. Leading the field in practical solutions IBM calls this ‘cognitive business’ and it is definitely a part of our future.

You Control your Future

In the panorama of the Digital Transformation, AI is out there as a wildcard with seeming limitless possibilities. We are both familiar with, and scared of, these futures because of numerous science fiction dramas. HAL is not here yet, but its coming. For you it’s really a case of whether your company or the competition deploy machine learning systems first. You don’t need an AI system to answer that question.


Image credit: NYTimes



Data Makes the World Go Round

Big Data

The key to success in the digital age is transforming your business model by leveraging data in a way that either improves your operations or adds value to your customers. Unlike money, much of the data that makes the world go round is free! Where can you get it and how could you use it?

 LittleBlogAuthorGraphic  David Hodgson, March 7, 2016

A year ago I posted on the idea that “data is everywhere”. Since then there is more data in the world, greater accessibility and better tools. Read that old post if you have time because it will still add something to what you learn here if you are a newcomer to Big Data Analytics.

What Data?

Of course your company has been using data for years and doing analysis and reporting on it too. With an explosion of new analytical tools and the low cost of cloud based facilities and storage the world has discovered that the business data you have had in structured databases for years is just the tip of the iceberg. With so many recreational and business activities now being conducted on-line, the world is generating lots more data and this data, often very unstructured and fragmented, is becoming both accessible and useful.

This data might in fact be in-house, in system logs and Excel spreadsheets i.e. not readily available for large scale analysis. But it might be outside the company and a new asset that you could acquire and derive value from. It could be as crazy sounding as data from social media feeds (like Twitter or Facebook), or it could be more down to earth in terms of semi-structured lists. Most interesting is that some of it could be free!

To find some of this “new” data you might start by just googling “free data sources”. This will yield many references to follow up, but some of the richest data and most useful to business is provided by government agencies.  The list below will lead you to a wide variety of examples.

Sources of Free Data from Government Agencies

US Government’s Open Data
US Census Bureau 
National Climatic Data Center 
The CIA World Factbook 
Socrata – Open Data Network
European Union Open Data Portal 
UK Open Government Initiative
NHS Health and Social Care Information Centre 
Government of Canada Open Data
Open Government Data Catalogs 
UNICEF Statistics and Monitoring
World Health Organization

But how do I use it?

Using data from these or other sources is a multi stage process. First you have to access the data, be able to bring it into your repository, maybe Hadoop. Then you must clean it up or format it so that it fits for your use (key value pairs, JSON, structured, semi-structured etc.). Lastly you have to create the analysis and and visualization that will give you the valuable insights you seek, often by pairing it with your existing structured data.

Accessing data could be a simple Extra, Transform & Load (ETL) process if you have access to the source database. Often though the data you want is only accessible through a defined Application Program Interface (API ).  The Socrata link above details APIs for all of the thousands of data sources listed (e.g CA Prop 39 grants)

The API is the underpinning of most apps in the digital economy and the way apps interact or are integrated. Having a programming team that can use publicly available APIs to access data is an essential step for you. One of the hubs of information on APIs is the Programmable Web site. If you are building your analytics app on a hosted platform like AWS, Azure or Google they have databases of useful data that you can access for free through their APIs. For example see the list of AWS public datasets

But how could any of this be useful to me?

Firstly you need to educate yourself about the potential for Big Data analytics and the availability of data from numerous sources. You don’t know what you don’t know until you start to look at what is going on. I have heard stories of banks detecting ATM weaknesses in their competitors through social media feeds and using that knowledge to target new customers. Or the insurance company that gets a red flag on a fraudulent claim by spotting that the two opposing parties in a claim have been Facebook friends for years.

Perhaps these examples of how some government agencies are using data will start the thought process. Or read this report from McKinsey for results from the world of business.

Once you have the general picture then it’s the questions you want to answer that are more important than finding data. Step back and think about questions that, if you could answer them, would give you some sort of operational advantage or competitive edge. Make sure that you know what you would do with the answers if you had them!! Then hire a good data scientist or two and start a first project to amaze yourself. They will find the data to answer the questions and work out how to do the analysis and visualization.

What can go wrong here?

Many things could go wrong, but I will highlight three:

1. You spend too much and go over budget

There’s no such thing as a free lunch right? and processing even free data is going to be time consuming and costly as you hire the right people, buy the right tools and experiment to get a useful result.

2. You under use the data and fail to get full value

The real value of your analytics will be realized as you allow business users to interact with the data in the context of making day-to-day decisions. Limiting access just to the data scientists who started the initiative will be a big mistake.

3. You forget the need for security & compliance

Probably the biggest thing that can go wrong, especially as you expand access to business users, is falling foul of privacy and compliance regulations. For Hadoop based systems you will need to leverage security systems like Ranger, Sentry and Knox to pass user information up to the application layer, and enforce access authorization based on data lineage.

Get Going

Data really does make our digital world go round. If you haven’t yet found new data sources and new ways to use data to super charge your business, then you may be behind your competitors. Get going, find a project to start a “Big Data” initiative and start your own Digital Transformation.


Image credit: Unknown origin



The Digital Transformation – Advice to a CEO


Tell a CEO about all the exciting trends and possibilities we are experiencing as part of the digital transformation and they are going to ask “What’s the bottom line?” We can’t give $$ based answers to that question outside of a specific proposition, but we can focus on results and ramifications.

LittleBlogAuthorGraphic  David Hodgson, February 24, 2016

Often commentators like myself will focus on the causes and drivers of change, and this can be useful. But this blog entry will focus on what I see as the three main overarching results of the ongoing digital transformation.

This analysis may make specific ramifications clearer to a CEO and help them see how to adapt a strategy to both survive and maybe gain a competitive edge. It is a generalized overview. As a CEO you must answer your own specific questions to see and create the results that will allow you to successfully compete.

#1 The digitization of all we use and touch

A convergence of technical breakthroughs and economic forces is now setting a course for the digitization of everything. This includes the mundane, like images, texts and music, to the incredible, like genomes, body parts and our very thoughts.

Digital forms are always more flexible, lower cost and ultimately more valuable than their physical equivalents, because of the ways they can be re-used in the digital world.

Any business that thinks it won’t be impacted is likely to be the next to go bankrupt. No industry or job will remain the same and many will cease to exist.

What digital assets do you have, or should you acquire to compete in this new age? Can you be the leader in transforming your area by transforming something important to a new digital form that could amplify its value? Could you leverage a new source of data, or an API from one of the new digital players?

#2 the democratization of playing fields

In his book “The World is Flat”, Thomas Friedman explains something he labels as “Globalization 3.0” which has had the effect of leveling the global playing field in terms of commerce. Underlying what he describes, are the same forces behind what we now call the digital transformation.

But commerce is just one playing field that is being leveled. We can now all have access to knowledge, tools and perhaps most importantly, communities, that enable us to be artists, scientists, musicians, developers or inventers. Whatever we want to do. Few domains of activity remain exclusive to an elite in the same way as they used to be.

This does not mean that there aren’t economic and power elites. In fact within many communities (like the USA) the last decade has seen an economic polarization and a concentration of money and power. However, globally we see a flattening and ultimately I think that the forces of the digital transformation will make massive polarization impossible within any community. But then I have always been a technological optimist.

CEOs must realize that the forces of digital transformation have empowered consumers and small business more than large companies. In many cases power is being fragmented and distributed and this has largely been through the app and the smartphone as tools of digital conveyance. For example, banks have to compete harder to retain fickle customers who can now switch easily. Teenagers don’t have to buy an album to hear the one song they like. Alternate offerings are easily compared and there is always a new competitor.

What do your customers really want? What is changing about how they want to consume or use what you have to sell? It is imperative that your strategy focuses on gaining a competitive edge via the experience and value you deliver to the “users” of your digital assets.

#3 the disruption of incumbents

Facets of the first two results I have described above, combine to create this third result. Generally speaking, because of the digitization of everything and the leveling of playing fields, the cost to entry in any business area is greatly reduced.

It costs so little to set up what would have been considered a sophisticated business just a few years ago that it becomes effective for small startups to offer products and services for “free”. Incumbents used to near monopolies, or established customer bases, might have been charging premium prices for what can now be offered massively cheaper.

The software industry is being turned upside down by open source software. Amazon offers cheap IT infrastructure. The hotel industry is being threatened by Airbnb turning a community of individuals into an unstoppable competitor. We know what happened to Blockbuster.

Sometimes, as with Uber, the incumbent is threatened mainly by what the user perceives as a more attractive experience. For Uber this would be easier access, better quality and a reasonable price as compared to taxi’s and regular car service.

Death or Glory

The CEO of an established firm must realize that it is death or glory for incumbents. Who are your new “Digital Transformation” competitors? If they aren’t here yet how could they appear? – what is it about your current business model that could be a weakness in this new world.

In summary my advice is that it is transform or be transformed and you must embrace all Clayton Christensen’s thinking on the “Innovators Dilemma” to engage the issues.


Image credit: McKinsey