Never mind 'big data'

We're still coping with the era of 'big spreadsheets'

Guest post by Cheryl McKinnon

Gartner Inc., recently projected that "big data" will drive 232 billion in IT spending by the end of 2016. While not a market unto itself, "big data" is expected to have a significant impact on the storage, business intelligence, database and middleware markets over the next few years. Right now, social media and machine-generated information are the primary sources of data for this new generation of analytical tools."User-generated content" on the web, on social networks, and shared via hundreds of apps and cloud services, dominate our conversations about information management in 2012.

Pundits, vendors and consultants are quick to jump on technology innovation, touting the next shiny object that will transform the way business is done. The big data bandwagon is not unlike the hype around social media, "engagement" and "experience management." These trends demonstrate a desire to look at external patterns and behavior to improve or explain business activities, rather than focusing on internal patterns and behavior.

What was the original "user-generated content" to have had a material impact on our internal business operations?  It is, perhaps, the lowly, unglamorous spreadsheet. Few organizations have applied analytics and intelligence-mining tools to this existing treasure trove of in-house data. Spreadsheets have been created and maintained by information workers for nearly 35 years. What secrets, patterns and insights remain locked away in those ad-hoc worksheets? There are two major risks when enterprises fail to recognize the data that floats under the radar in spreadsheets: the risk of ignoring information needed to make consistent decisions and the risk of letting errors, and non-compliance, proliferate in the dark of user-defined rows and columns.

Big data gets all the hype today, but enterprises around the globe continue to be run by big spreadsheets. Spreadsheets typically have few quality controls, analytics or simple ways of testing, validating or detecting patterns. 99.7 percent of businesses surveyed by Deloitte in 2009 used spreadsheets. 70 percent of respondents had "heavy" reliance on spreadsheets to support critical portions of their businesses, yet only 42 percent of companies paid attention to spreadsheets as part of overall risk reporting and assessment. Organizations need to understand the information held in such mission-critical spreadsheets and perform the quality assurance, and analysis, to ensure the data is accurate and appropriately used.

Spreadsheet data is "enormous in size and impact", according to academic research by Professors R. Panko and D. Port presented in 2012.  End-user computing applications, such as MS Excel, have had tremendous effects on organizational decision-making, yet it "seems to be invisible to the central corporate IT group, general corporate management, and information systems  researchers", according to Panko and Port. User-generated content (referred to as "end-user computing" by Panko) can present both opportunities for insights, but also risk. Compliance and privacy violations, errors, inadvertent copies of data that should have been destroyed: This dark data needs to be analyzed and understood, not ignored.

Academic research at Carnegie Mellon University used U.S. Department of Labor statistics to project that by 2012, there would be 90 million computer users in the American workplace. 60 percent--or 55 million--of these people would use spreadsheets or simple databases as part of their day-to-day work.  European research, done at the Delft University of Technology, reveals that financial analysts average three hours a day in Microsoft (NASDAQ: MSFT) Excel. A typical spreadsheet has an average life span of five years and is used by up to 13 different analysts.

Despite this pervasive, universal reliance on spreadsheets, testing for errors and consistency of use, and QA of custom macros, are almost non-existent in big companies. Professor Ray Panko estimated that in any given spreadsheet, 2-5 percent of formulae are in fact wrong--regardless of the experience level of the spreadsheet user. Up to 5 percent of these errors can in fact be "material" to business operations. Incorrect forecasting, cost management, or revenue projections can cost a business millions of dollars, and even help disguise internal fraud. Findings presented in 2006 to the European Spreadsheet Risks Interest Group (EuSpRIG), by researchers Grenville Croll and Raymond Butler, reviewed the use and testing regimen for spreadsheets used in clinical medicine. Lack of data validation, persistent use of "cut and paste" shortcuts, poor documentation, and relatively "unfettered manipulation" of data and formulae were demonstrated, with real-world life and death consequences.

So how can organizations reduce the risk presented by reliance on the "big spreadsheet"? How can organizations encourage access to this essential business data, yet improve governance to ensure its accuracy and compliance?

The first recommendation is to understand where critical business data is being stored. Odds are good it will be in a spreadsheet, even if it was originally exported from a line of business application. Understand how this information is being crunched, sorted and analyzed. The Deloitte study recommended creating an inventory of mission critical spreadsheets. Organizations with an information governance strategy, ECM deployment or internal data map should be ahead of the game on this recommendation. Provenance, use case, sensitivity levels and confidentiality of the data should be identified. Appropriate metadata, tagging and security for spreadsheets should be part of any information management program.

Second, do not skimp on analytic tools that can be used and understood by a typical business worker. There are 90 million in the American workforce, and the majority of them are using spreadsheets in some way. There are several inexpensive spreadsheet governance and analytical tools on the market that don't require major corporate IT investments. As the "consumerization of IT" continues, even moderately tech-savvy managers, analysts and operations teams can make better, more consistent decisions with extended analytics and governance for their bread and butter spreadsheet data. 70 percent of businesses are already making heavy use of spreadsheets for critical corporate decisions, yet less than 34 percent will attest to using specific techniques to control and manage spreadsheets, and periodically evaluate these controls. 

Finally, don't pin hopes of improved business productivity and operational insights solely on "big data" when the foundational analysis and quality assurance of the "big spreadsheet" hasn't even begun for most companies. Take a look at the dark data stored in departmental spreadsheets, off the CIO's radar, to see what quick wins can be made with the untapped data sources already in-house. Real-time insights are amazing, but historical context from existing data sources--probably a spreadsheet--can help put context around the excitement of big data analytics.

Cheryl McKinnon is the President of Candy Strategies Inc., a consulting firm that helps companies adapt to the new digital workplace. This article was adapted from a blog post she wrote for InformationActive Inc., a software provider who develops data analytics for spreadsheet users.