Professional Documents
Culture Documents
Analyst(s): Mark A. Beyer, Anne Lapkin, Nicholas Gall, Donald Feinberg, Valentin T. Sribar
Key Findings
Concern over big data represents the first manifestation of the extreme challenges that will
overwhelm existing information management practices and technology.
The ability to manage extreme data will be a core competency of enterprises who are
increasingly using new forms of information (text, social, context) to look for patterns that
support business decisions (Pattern-Based Strategy).
In practice, multiple factors will move toward the extreme and will interact to make data
management more complicated.
Enterprises are beginning to focus on high volumes of information to the exclusion of the many
other dimensions of information management thereby leaving massive challenges to be
addressed later.
Recommendations
Analyze business and operational problems using the 12 dimensions of extreme information
management particularly if those problems arise suddenly or seem unusual or unexpected.
Re-examine the enterprise's five-year plans from the perspective of the 12 dimensions. Rank
the importance of the various dimensions and determine which are current pain points.
Link data management practices across dimensions. For example, to ensure the fidelity of a set
of data, apply your evaluation standard to the metadata that came along with the data, and pay
attention to all the access enablement and control dimensions to ensure the metadata hasn't
been tampered with.
Assess the ability of the current information management infrastructure to effectively handle all
the dimensions of extreme information management environments.
Table of Contents
Analysis.................................................................................................................................................. 3
Introduction...................................................................................................................................... 3
The Uses and Dangers of the Term "Big Data"................................................................................. 4
Examples of How Data Volume Evolves Into Other Challenges................................................... 4
"Big Data" Starts a Conversation About Other Dimensions......................................................... 4
Extreme Aspects of Information Management.................................................................................. 5
Big Data Starts With Quantification............................................................................................. 6
Addressing Access and Qualification.......................................................................................... 8
Pattern-Based Strategy Demands Extreme Information Management.........................................9
What to Do About Extreme Information Management Issues.......................................................... 12
Recognizing the Challenge....................................................................................................... 12
Planning for Extreme Information Management.........................................................................12
Managing Data Extremes..........................................................................................................13
Summary........................................................................................................................................14
Appendix: The Economics of Data..................................................................................................14
Recommended Reading.......................................................................................................................14
List of Figures
Figure 1. "Big Data" Concepts Create an Unbalanced Data Environment................................................7
Figure 2. Dimensions of Information Management.................................................................................. 8
Figure 3. Pushing Information Initiatives to the Extreme........................................................................ 11
Page 2 of 16
Analysis
Introduction
Big data has such a vast size that it exceeds the capacity of traditional data management
technologies; it requires the use of new or exotic technologies simply to manage the volume alone.
But processing matters, too. A complex statistical model can make a 300GB database "seem"
bigger than a 110TB database even if both are running on multicore, distributed parallel processing
platforms. Big data has quickly emerged as a significant challenge for IT leaders. The term only
became popular in 2009. By February 2011, a Google search on "big data" yielded 2.9 million hits,
and vendors now advertise their products as solutions to the big data challenge. Inquiries about big
data from Gartner clients have risen sharply as well.
This interest springs from an increase in data volumes within enterprise systems caused by
transaction volumes and other traditional data types, as well as by new types of data associated
with next-generation operational technology (OT), video streams, images, audio, social networks
and so on. Social networking alone could bring huge external datasets into the enterprise either as
actual data or metadata and links from blogs, communities, Facebook, YouTube, Twitter, LinkedIn
and others. Importantly, volume does not only refer to the management of storing data, but also the
analytics processing the information. Too much information is a storage issue, certainly, but too
much data is also a massive analysis issue. In addition, context-aware computing and nextgeneration devices could bring in another huge set of data created or captured on mobile phones
and tablets (for example, images, video and audio), as well as contextual data such as location
data, previous searches, preferences, ratings, and future information types we don't know about
yet.
Enterprises often address new information management challenges with one-off solutions, and the
big-data challenge could unfortunately follow the same pattern. Certainly, big data will require new
approaches to distributed parallel processing (such as MapReduce). But the increasing velocity,
variety and complexity of data also pose a challenge, and some information managers have
deployed systems that assume users have qualified the information and have appropriate access to
it. In other words, "big data" implies other dimensions besides volume, and these dimensions
become critical for use cases such as Pattern-Based Strategy, e-discovery, information governance
and context-aware computing. Information managers may be tempted to focus on volume alone
when they are losing control of the access and qualification aspects of data at the same time. If they
do focus too narrowly, their enterprises will have to make massive reinvestments within two or three
years to address the other dimensions of big data.
Today's information management disciplines and technologies are simply not up to the task of
handling all these dynamics. Information managers must fundamentally rethink their approach to
data by planning for all the dimensions of information management. The business's demand for
access to the vast resources of big data gives information managers an opportunity to alter the way
the enterprise uses information. IT leaders must educate their business counterparts on the
challenges while ensuring some degree of control and coordination so that the big-data opportunity
doesn't become big-data chaos, which may raise compliance risks, increase costs and create yet
more silos.
Page 3 of 16
Real-time data, which focuses on what is happening right now, not on what has already
happened. It enables situational awareness. Real-time data raises the issue of perishable data
(data freshness) and "orphaned" data (which no longer has valid use cases but continues in use
nonetheless; this is different form orphaned data that has lost integrity).
Shared data, which focuses on information shared across applications. To share information
effectively, enterprises must ensure the data is consistent, usable and extensible. People can
combine it with data from other sources and easily share it with other users. More importantly,
shared data complicates the task of determining the authority of information.
Page 4 of 16
Linked data, which comes from various sources that have relationships with each other and
maintain this context so as to be useful to humans and computers. (Linked data often uses the
Resource Description Framework data model and Uniform Resource Identifiers to name data
objects, which can be accessed via HTTP.) Once data is linked by a user, a relationship in that
data persists from that point forward. The Linked Data group, a public organization, addresses
these concepts.
High-fidelity data, which preserves the context, detail, relationships and identities of important
business information (often via embedded metadata). Most importantly, high-fidelity data allows
new meanings to be added without destroying the previous meaning of the data.
Gartner contends that terms like "big data," "real-time data" and "linked data" signal a new era in
which the economics of data (not the economics of applications, software or hardware) will drive
competitive advantage (see Appendix). But these big-data, real-time practices and data
management theories create a pool of data dependent on external factors or otherwise stretch
conventional data management technologies and practices beyond their capacity. Traditional
information management techniques usually assume a cohesive control of both the storage
processes and the integrity of the information then link disparate sets via metadata instructions,
which are executed in a type of application server. With big data, the process must move to the
data, instead of moving the data from its stored location into a process and then back to write out
the result. The only merit of the traditional approach is that it happens to be in place yet it
provides no real advantage and actually increases the number of hours required to maintain and
modify it.
Big data will cause traditional practices to fail, no matter how aggressively information managers
address dimensions beyond volume. Data and system architects have learned that immediate
business needs drive the information architecture along one or more dimensions of data
management sometimes to the exclusion of the other dimensions. But the moment any
information or data asset leaves its original process, the excluded dimensions of information
management reassert themselves.
Volume of data.
Page 5 of 16
Classification.
Contracts.
Pervasiveness.
Technology-enablement.
Fidelity.
Linked data.
Validation of data.
Perishability.
Velocity involves streams of data, structured record creation, and availability for access and
delivery. Velocity means both how fast data is being produced and how fast the data must be
processed to meet demand.
Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data,
video, image, audio, stock ticker data, financial transactions and more.
Complexity means that different standards, domain rules and even storage formats can exist
with each asset type. An information management system for media cannot have only one
video solution.
Page 6 of 16
Velocity
Variety
Volume
Complexity
In Figure 1, the middle portion of the concentric circles represents traditional operational
parameters that most IT leaders are comfortable with. As the outward-pointing arrows indicate, a
data environment can become extreme along any of the four dimensions or all of them at once.
Multiple factors will move the data challenge toward the extreme and will interact to make data
management more complicated.
For example, an increase in the speed with which data changes may accompany an increase in
data volumes. Or adding a variety of new data types to the data environment (such as video) may
mean dealing with technologies for which there are no standard formats and therefore involve more
complex data types. Thus, IT leaders cannot simply focus on data volume alone or any other single
dimension. They must be aware of all the dimensions that are moving toward extremes in their
environment, then learn how these dimensions interact with each other. Information managers must
make decisions every day that balance the long-term needs of the enterprise against the immediate
pain points.
Big data suggests a new scenario for information architecture that can solve problems we cannot
address today. If data can be analyzed rapidly by moving the application to the data, instead of the
other way around, we can move multiple scenarios to the data very quickly. Instead of determining
data quality rules processing for batch applications, the same data quality rules can be moved to
the data along with the analytics model. When we remove the issue of data transport and
processing along the "route," we can analyze information under multiple scenarios very quickly.
Page 7 of 16
Perishability
Fidelity
Linking
Contracts
Pervasive Use
Volume
Quantification
Variety
Complexity
Page 8 of 16
stage for who can see data, how fast it should be provided, the different delivery mechanisms and
much more. As a result, they provide the most significant opportunity to plan for context-aware
computing, data center modernization, and the convergence of OT and IT.
Contracts involve agreements on who will share information and how both inside and outside
the enterprise usually represented by metadata. This dimension also includes the terms of
sharing, how the records will be exposed, the intended use, how long can you use the
information and so on. These contracts are required to satisfy compliance requirements,
especially with respect to external data transfers between enterprises, such as a retailer
sending files to a data enrichment service to add demographic detail to customer information.
Pervasiveness refers to information and data that becomes "hot" and is in great demand across
the organization. How long does data remain active? What do you do with orphaned data that
has outlived its value but for some reason keeps hanging around?
Technology-enablement involves specifications derived from the other 11 dimensions that guide
the design and integration infrastructure of systems such as data integration tools, data quality
tools, master data management and application middleware.
Fidelity means the ability or inability to confidently adapt an asset for wider use.
Linking involves data in combination and the uses related to this context.
Validation ensures that the information was created in accordance with complete understanding
of the use cases and includes all the other aspects of data quality. Unknown future use cases
make validation a constant challenge.
Perishability refers to the confidence that the data remains valid and reaches all use cases while
it remains so. What is the shelf-life of the information? How long does it remain useful? How
long should it be kept? What are the aging aspects of information?
Page 9 of 16
on the validation and classification dimensions to ensure that only the correct people look at and
qualify data before it is made public. These solutions are appropriate for application design, but
their narrow focus means that the other dimensions must be addressed when information reuse,
integration and sharing begins. Ignoring the qualification and access control dimensions leaves a
hole in an information management strategy.
As previously noted, extreme management issues like big data will have significant impact on the
reuse, analysis and sharing of information assets. This will impact efforts to manage Pattern-Based
Strategy, data center modernization, OT/IT convergence and context-aware computing. Figure 3
provides contextual awareness of how these issues drive these 21st-century information
management and utilization initiatives and highlights the fact that ignoring qualification and access/
control leaves a gaping "hole" in an information management strategy. Importantly, while
quantification is depicted (with big data appropriately along only one axis), all 12 dimensions remain
important. Pattern-Based Strategy as an engine of change utilizes all the dimensions (not just
quantification) in its pattern-seeking process. It then provides the basis of the modeling for new
business solutions, which allows the business to adapt. The seek, model and adapt cycle can then
be completed in various mediums, such as social computing analysis or context-aware computing
engines. Finally, newly modified application engines create more data, and the cycle repeats itself.
Page 10 of 16
Pattern-Based Strategy
Social
Computing
Context-Aware
Computing
Velocity
Volume
Social
Networking
Search/
Mobile
Variety
Documents
Complexity
Transactional Data
IT/OT
Images
Audio
Text
Video
Enterprise Systems
OT = operational technology
The dimensions within each category interact to complicate the challenge of managing information
and data. Urgent demand for a particular set of data across the organization and higher
expectations for data validation may work against each other. Making data available to more people
and technologies exacerbates the challenge of preventing unauthorized access. The dimensions
interact between categories as well. For example, linking external data to internal sources increases
the challenge of maintaining a consistent ontology. Adding metatags to illuminate the context of
data can vastly increase the size of the data and further complicate technology issues.
Interactions between dimensions can also solve problems. For example, data validation (data
quality) can introduce bias, but the metadata of high-fidelity data can explain where that bias comes
Page 11 of 16
from and how to address it. Linked data, once linked, can be qualified and the linking weighted,
based on metadata found in the other three axes of the qualification category.
Will users need to know the context around data more than they do now, or will the life cycle of
data become critical?
Next, IT leaders should rank the importance of the various dimensions within each of the three
categories and then for all three categories together. Leaders should revise their data architecture to
reflect these priorities. The 12 dimensions can also help leaders make the trade-offs necessary to
support extreme data. For example, if linked data becomes most important, perhaps the business
can afford to lower its requirements for other dimensions such as the variety of data types
supported or the validation of data. IT leaders should work with business managers to agree on the
right set of compromises.
Page 12 of 16
This kind of big data analysis can also establish patterns and solutions for managing the disparate,
unqualified and high-volume data that is available on the public cloud. Techniques for accessing
and analyzing the very largest datasets apply to data from the cloud. Public data is just as "dirty"
and unqualified as any other source. The strategies that IT leaders have developed for ensuring the
fidelity of data can work for large, publicly available datasets. The largest datasets will require the
use of public or private cloud computing resources to achieve periodic and on-demand scaling.
Page 13 of 16
campaigns across inventory sources). In some cases, both enterprises and vendors will be able to
use venture capital to develop solutions for extreme data management.
Summary
Clients and vendors increasingly encounter a phenomenon they call "big data," but the term is
sometimes misleading because the challenge has many dimensions in addition to the volume of
data under management. Gartner has identified 12 dimensions in three categories: quantification,
access enablement and control, and information qualification and assurance. These dimensions
interact with each other to exacerbate the challenges of next-generation information management.
IT leaders must recognize the signs of these challenges, design information architectures and
information management strategies to address them, and deploy new technologies and practices to
manage data extremes, as traditional methods will fail. Failure to plan for all data dimensions in
systems deployed in 2011 and 2012 will probably force a massive redesign for more expansive
capabilities within two or three years. IT leaders should stop thinking about big data volumes alone
and consider all 12 dimensions of extreme information management when they develop modern
information architectures and strategies. This perspective will enable them to make intelligent
compromises.
Recommended Reading
Some documents may not be available as part of your current Gartner subscription.
"Findings: 'Big Data' Is More Extreme Than Volume"
"Hadoop and MapReduce: Big Data Analytics"
"2011 Planning Guide: Data Management"
Page 14 of 16
Page 15 of 16
GARTNER HEADQUARTERS
Corporate Headquarters
56 Top Gallant Road
Stamford, CT 06902-7700
USA
+1 203 964 0096
Regional Headquarters
AUSTRALIA
BRAZIL
JAPAN
UNITED KINGDOM
2011 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This
publication may not be reproduced or distributed in any form without Gartners prior written permission. If you are authorized to access
this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained
in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy,
completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This
publication consists of the opinions of Gartners research organization and should not be construed as statements of fact. The opinions
expressed herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues,
Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company,
and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartners Board of
Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization
without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner
research, see Guiding Principles on Independence and Objectivity.
Page 16 of 16