Overview
The big box stores like Wal-Mart and online retailers like Amazon have been leveraging "Big Data" for years, but they have millions of customers and hundreds of thousands of transactions a week. How can small organizations hope to tap into Big Data when they don't hit a hundred thousand transactions in a year? The answer is while smaller organizations likely won't achieve the Big Data capabilities of their larger counter parts, it doesn't mean they still shouldn't travel the path.Big Data, much like "The Cloud", is an industry buzz term comparable to large scale Business Intelligence. Gartner defines Big Data as "high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.".
Plotting points on a graph helps to conceptualize the high-volume of Big Data. If you have two points on a graph, A and B, a straight line is drawn and an assumption is made the line will continue its trend after point B.
If you have points A.1, A.2, A.3 and A.4 between points A and B, some insight is gained into the ups and downs between the two points and an educated guess can be formed as to which direction the line will go, and with what severity, after point B. With Big Data you have thousands or millions of points between A and B providing tremendous insight into the trends occurring between the two points, and now your confidence level soars in regards to where the line will go after point B.
High-variety means Big Data doesn't just look at your organization's internal databases or typical external data sources such as industry compiled data, it looks everywhere. Phase 1 in the Path to Big Data (outlined below) provides examples of the variety of data that can be tapped.
An analogy using animation helps explain the concept of high-velocity. Imagine a single static report generated from an application (financial, HR, etc.) is a one page cartoon sketch. Think of Business Intelligence as multiple sketch pages in a book and flipping through the pages produces basic animation. Think of Big Data as taking the sketch book and turning it over to the computer graphics imagery (CGI) talent at Pixar. Now you start to get an idea of what Big Data can mean to business.
The Path to Big Data
While achieving Big Data capabilities of the big box stores is likely out of reach for smaller organizations, their journey to achieve Big Data can empower them with capabilities most of their competition won't have and change the way their organization makes decisions. To provide transparency to the journey I’ve broken the path into 5 phases*.Phase 1: Identify Data Sources
The foundation of Big Data is the data sources and in any system the outputs are only as good as the inputs. Identifying potential high-variety data inputs is an exercise organizations can start now, a few examples include:- Internal structured data sources - structured data has a predetermined model, think of an online web form requesting first name, last name, street, city, state and zip. This data is typically found across a company's business systems such as customer transaction, financial, customer relationship management (CRM), inventory and website tracking systems.
- Internal unstructured data sources - unstructured data has no predetermined model, think of an email or a tweet. This data is typically found across a company's communication and social media systems such as email, instant messaging, Facebook and Twitter.
- External data sources - an area where small organizations can capitalize on the Big Data industry trend. There are a variety of sources that offer external data sets, from public sources such as the US Government at http://catalog.data.gov/dataset private sources such as Microsoft's Azure Marketplace at https://datamarket.azure.com/browse/data or Amazon Web Services at http://aws.amazon.com/datasets. I recently came across a client, a progressive city municipality, publishing Geographic Information System (GIS) data sets on an open platform readily available for public access.
Phase 2: Identify Gaps in Data
Once the data sources are inventoried the sources can be reviewed for potential gaps in data collection. Is the website tracking system only recording page hits, or are tools such as Google Analytics gathering more robust data like page duration and navigation paths? What about the Facebook and Twitter results from marketing campaigns, is this data being collected and organized with one of the many analytics tools for social networks? In this phase a data architect or consultant can add tremendous value by identifying unrealized gaps. Be sure business objectives, success metrics and end user requirements are gathered as a baseline for an IT professional to determine gaps.Phase 3: Address Data Gaps
After identifying the gaps, the level of effort to address the gaps is assessed against the anticipated value of the data. While this is a difficult undertaking, due to a core concept of Big Data being you don't really know what you're looking for until it is presented, error on the side of conservatism due to Phase 5: Measure and Repeat. If it is anticipated addressing the data gap will provide a return on the investment, current systems must be augmented or new systems implemented to capture the required data. This phase should involve IT professionals (data architects, system engineers, IT consultants, etc.) to guide organizations to a comprehensive data gathering solution**.Phase 4: Solution Development
For most small to midsized organizations this phase involves solution identification, selection and implementation. Few organizations can realize the return on investing in a customized business intelligence and analytics solution due to the higher cost to build versus buy. Excellent resources are available for solution identification, such as Gartner's Magic Quadrant for Business Intelligence and Analytics Platforms, 2013. When acquiring technology of this magnitude, organizations should take a rigorous approach to vendor identification and solution selection. The limited detail in this article behind implementation is attributed to the size and complexity of selecting and implementing the right solution to meet a particular organization's requirements.Phase 5: Measure and Repeat
I advocate starting small both addressing data gaps and in the solution development phases, using multiple iterations to meet final objectives. Overly ambitious projects can lead to high implementation costs, long project schedules and lofty expectations. This project will be a learning process for your entire organization, so take it slow. At the end of each iteration measure the results against success metrics and project/business objectives, and refine the scope of the next iteration accordingly. This waterfall style approach delivers faster results and helps control costs. Use unbiased IT professionals to guide you through each phase, and beware of vendors who steer you away from the early phases straight to the solution.Conclusion
While this is a broad and simple look at the path to Big Data, the intent is to expose the accessibility of Big Data to smaller organizations, and educate on the general requirements for a project of this magnitude. Some of the activities listed can take months or years, such as addressing data gaps through implementation of new systems, so start planning now. Remember on the journey to Big Data always keep your business goals and objectives clearly defined and insight, it's easy to get lost in the weeds with a project of this size and complexity.There are a plenty of books on the topic that provide a much more comprehensive and detailed review of Big Data. Two nontechnical books that I've read and recommend include Secrets of the Big Data Revolution and Big Data, A Revolution.... Please feel free to share insights and lessons learned on your path to achieving Big Data capabilities.
*Standard project activities are omitted, including establishing objectives, success metrics, gathering requirements, etc.
**Phase 3 and 4 have the potential for solution sharing so engage all involved IT professionals in both phases.
No comments:
Post a Comment