Annual Analytics Assessment


At this time last year I celebrated the Chinese New Year by publishing my predictions about the world of analytics and Big Data. Its always fun to look back and hold yourself accountable, so let’s go!

LittleBlogAuthorGraphic  David Hodgson, January 14, 2016

In 2015 the Chinese year of the Ram started on February 19. For 2016, the year of the Monkey, the new year has moved forward to start February 8. It is determined by a zodiac cycle and so bounces around a bit compared to our Julian calendar, which approximates the solar year.

So how did I do on my five predictions?

To keep me honest you can read my blog entry from last year here.

#1 I correctly predicted that people would still be using the term “Big Data”. No-one likes this buzzword, but those who predicted its demise were too optimistic! It’s here to stay because culturally we like reductionist simplicities.

#2 I may have been a bit optimistic in saying that every major enterprise would have a real, funded, Big Data strategy by the end of 2015. I bet it’s close though. Analysts reported continued adoption driven by LoB departments, with Marketing front and central.

#3 I said that “data agility” will be the aspirational driver of big data strategies. We did indeed continue to see people moving data from traditional proprietary data warehouses into more portable forms. However, the erosion slowed as the incumbents found ways to embrace Hadoop and the new became more additive to the old than replacing it. T’was ever thus with IT.

#4 2016 will get us to 10 years of Hadoop being used as the primary tool of Big Data analytics. I got this right, but several forecasters were saying that we would have moved on from Hadoop to other technologies. The elephant still stands largest and squarely in the middle of the room. True we saw Cloudera and many enterprise adopters embrace Apache Spark over MapReduce, but still in the context of the Hadoop stack and ecosystem.

#5 The IoT did not yet become the shaping force that I thought it might. Of course it waits in the wings, growing slowly (who got Nest thermostats last year?), just needing enterprises to really embrace the new technologies involved.

So what trends did emerge to continue as shapers for 2016? I’ll just call out two areas this year.

#1 Analytics and the IoT

It is still early days for enterprises finding business advantage with analytics on IoT generated data, but I think we will see significant progress in the year ahead. Most speculators would bet on the manufacturing industry for results here. But given the proliferation of wearable technology in the last 18 months, my bet is that we will see serious headway in healthcare related analytics.  To come true this prediction is probably dependent on the next area.

#2 Security and Compliance

Security and compliance issues remain a barrier to larger scale production implementations, particularly where PII information may be involved. I predict that we will start to see better defined process and procedures around the handling and merging of structured and unstructured data.

Otto Berkes of CA Technologies has suggested that Bitcoin’s blockchain protocol could be re-used as a secure and validated way for IoT devices to communicate and exchange data. Otto is a lot smarter than me, so I will just say that in 2016 we will see stronger solutions emerge to make the IoT secure and less vulnerable to corruption by hackers.

The Monkey Wrench

Ok, so that’s three predictions really not two – I’ll review them next January. What will the Year of the Monkey really bring? With the economy picking up steam, analytics will be central to IT investment and hiring. We will see a lot of companies copying each other (as monkeys do) but let’s look out for the “alpha ape” trend setters; those who will take us into new territory. Who are you watching? Let me know by commenting at the top of this blog.



What Big Data dreams may come


Instead of asking, “What’s the big deal with Big Data?” try asking, “What’s the big dream?”

LittleBlogAuthorGraphic  David Hodgson, March 26, 2015

The best ventures start with a big dream and big dreams come in different sizes. In the world of data analytics this might be cracking a top secret code, growing your business, detecting fraud in real time or successfully making purchasing recommendations based on profiles and patterns. However big your dream appears to other people, it is important to you and the goals you want to achieve, so it is worth planning properly and equipping yourself with the tools you need.

In my last few blog posts, I’ve been talking about the challenge of gaining a competitive edge for your business with all the data that is available to you, structured and unstructured. This is the Big Dream.

The importance of creative experimentation and diversity

The growth and transformation of data analytics in recent years has generally been fueled by the interest from lines-of-business (LOBs) within a company – not by the central IT department. The LOBs have wanted to experiment with tools they thought might give them a competitive edge, an extra business insight. This experimentation is an essential ingredient of success and LOBs must be allowed to empower themselves by choosing the tools and services they need.

This revolution has been enabled by the availability of commodity compute power (on-premise and cloud) and the emergence of open source tools driven mainly by the Apache foundation. The diversity of tools that has emerged is incredible. Hadoop is not one thing, but a stack of technology surrounded by an ecosystem of supporting tools that allow you to build the solution you need. I wrote about this explosion of creativity in an earlier blog post. There are multiple distributions of the stack with proprietary code added for different advantages. And then there are numerous NoSQL databases, Cassandra, MongoDB and probably 30 others, that are suited to different sorts of analytic operations.

This creative experimentation and diversity is still incredibly important. We have not yet seen the end of the tool evolutions and are just beginning to realize the potential of these new technologies and the ways businesses can transform themselves in the application economy.

How can IT help?

IT’s big dream is to be more than a cost center and align to the business to be an essential ingredient in the recipe for success of a company’s goals. In the fast proliferating ecosystem of analytical tools and components, LOBs must have the freedom to experiment with the latest technologies. We are past the point when IT can control and limit software usage for their own convenience. To remain relevant, the IT department must support the new big data playbook and be the experts who learn the latest technology.

But how can IT do this and control costs? Even as a driver of business success, the IT department cannot run amok past the boundaries of their budget. So how can they both encourage creative experimentation and contain costs? How can they find all the skills they would need to use all the different tools?

A Big Data management strategy

CA Big Data Infrastructure Management (BDIM) is now being demonstrated and may be an answer for IT departments as they seek to help LOBs with analytics. CA BDIM provides a normalized management paradigm for different Hadoop distributions and NoSQL databases. It is a tool that will scale with your growing analytics implementations and will allow the LOBs to easily switch tools or platforms.

This single unified view approach will help reduce operational costs as experiments scale to production operations. With automation, productivity is increased and the differences between Hadoop distributions becomes no problem. For IT, the potential is a single management tool to manage all clusters, nodes and jobs. For LOBs, the potential is an IT partner that can take the burden off of production operations while leaving them the freedom to easily change direction.

CA BDIM will be on display at the Gartner Business Intelligence & Analytics Summit, March 30 through April 1, 2015 in booth 525 in Las Vegas. This first release is targeted at early adopters while we aim to deliver new functions and support every 60 days and evolve the product to deliver the maximum value for those that want to use it.

Is the Big Dream just big hype or your next key move?

While some analysts might say the industry is climbing up one side of the “hype cycle”, I don’t believe the one curve fits all phenomena. Real results are being achieved by innovators in this area. While many others still don’t understand the full potential, might be confused and can sound like detractors, this noise does not indicate hype on this occasion. We are in the middle of a paradigm shift where old technologies and practices will be left behind and the new is adopted.

Companies with aspirations of winning in the application economy will take advantage of the new analytics and partner with their IT department to move ahead and succeed. A few weeks ago, in North America at least, we adjusted our clocks to spring-ahead into brighter, longer day times. Check out CA BDIM and try for yourself to see if the product could help your company spring ahead of the pack to realize your Big Dream of success.


Big Data – It’s a zoo out there


Even the data analytics darling of today could be extinct tomorrow – how to tame the beast that is Big Data with the right IT skills and tools.

LittleBlogAuthorGraphic  David Hodgson, February 12, 2015

Hadoop became a rock star in 2014, emerging into mainstream IT from relative obscurity, and being recognized by analysts in formal market analyses. But equally important to Hadoop itself, are the plethora of other tools in the ecosystem, also fueled in the main, by the influential Apache Foundation.

The revolution in data analytics we see today just would not have happened without the confluence of open source software and very cheap processing power, whether that’s cloud or commodity servers in-house. Those two forces were like the finger of God in the software world, kicking off the equivalent of a Cambrian explosion of engineering creations.

My illustration below gives a brief overview of some of the major parts of the Hadoop ecosystem, but there are actually many others; this was all I could fit easily on one PowerPoint slide for a recent talk I gave on the subject.


Large animal pictures

The peculiar, perhaps Indian sounding name of Hadoop, was taken from the creator’s, daughter’s toy elephant hence also the logo. And following this theme, Mahout is an Indian term for the elephant keeper, the person who leads and maintains control over the elephant. And Ambari is the name of a special sort of howdah, the seats or thrones that elephants can carry on their backs in India.

It’s this complexity with the implication of arcane knowledge known only to insiders that is still holding back many companies from being successful. Yes, we have yet another IT skills shortage and we will be fighting over the best talent in this area for a while yet. Probably until more tools emerge that either bring order to chaos, or entirely remove the need for the lower level knowledge.

With all this complexity it really is a zoo out there hence the need for Apache ZooKeeper, a product that allows you to track the configuration data for all these components and ensure that you maintain the connections as you move systems and components around or move new projects into production.

Natural selection or genetic engineering?

Great diversity is always indicative of creative change – the evolutionary forces are certainly at work here. Many new species and varieties appear constantly and certainly some of the creations we see today will be extinct tomorrow. Preserved perhaps, stuffed and inactive in a museum of software, but no longer a part of the living zoology.

We are already seeing a decline in the use of the initial MapReduce process and the growing use of SQL layers to process Hadoop data. Even Hadoop itself, today’s data analytics darling, could be extinct tomorrow, displaced in the dominant gene pool by Pachyderm: software that is related only by the inference in the name. The latter is an exciting new startup that uses Docker containers to store the data and is built on CoreOS for the processing infrastructure.

Perhaps saying, “It’s a zoo out there,” is an understatement and really it is like the actual jungle where only the fittest will survive this initial bloom of new life. Hearing this, the timid may well decide that they don’t want to come outside to play; they will stay indoors with their RDBMS and traditional data warehouses. I suspect the Dodo did that!

If you want to avoid becoming a fossil yourself you cannot hang back; now is the time for IT to learn this stuff and for line-of-businesses (LOBs) to start demanding access to it via their IT departments, or to simply bypass IT and start playing with it on the cloud somewhere.

IT as zookeeper or ringmaster

So how does IT tame this jungle, circus or whatever metaphor you like best for this wild ride? How can they manage this diversity and both give their LOBs the tools that will drive a competitive edge for the company, and contain costs at the same time?

CA Technologies has the answers and will be talking about them at the Gartner BI and Analytics event in March. Be there or risk the likelihood of becoming a stony artifact of your former self!

How are you taming big data within your organization? Leave me a comment below.