Advances in Deduplication Help Tame Big Data

/// IDG Tech Dossier
HP CONVERGED STORAGE:
Advances in Deduplication Help Tame Big Data
CONVERGENCE HELPS ORGANIZATIONS MASTER THE ART OF DEDUPLICATION

IN TODAYS HYPERCONNECTED WORLD, with its multiple mobile devices, ubiquitous Internet access and pervasive social media platforms, people expect immediate access to information and services. These expectations are increasingly felt in corporate IT departments, where business units demand instant applications and turn-on-a-dime services. Virtualization and cloud computing can help corporate IT meet these demands by helping it become more flexible and agile. But the ultimate solution is to transform the way IT is delivered. Many enterprises have already started on the journey toward a full IT as a service (ITaaS) model. As organizations travel this road, however, they often run into a wall. Actually, several walls, including those between the server, storage and networking functions. The traditional IT infrastructure is often too rigid to enable companies to fully utilize their IT resources. In many cases, servers, storage and networking have been built and managed separately, creating functional silos. And within the storage architecture, an
explosion in the amount and types of datacoupled with new demands from the virtualization of servers and clientshas made storage increasingly inflexible and complicated to manage. These factors stand in the way of the kind of adaptability, agility and integrated management that the efficient enterprise requires. If organizations are to continue toward the goal of delivering ITaaS, they need to break down these barriers and lay the groundwork for a next-generation architecture.
/// THE LIMITS OF
TRADITIONAL STORAGE
The typical storage architecture was designed 20 years ago, when workloads were predictable and data was structured. But today companies are dealing with an unprecedented amount of information, including unstructured data such as audio and video, which requires massive capacities. Storage systems must accommodate many different types of workloads with different performance requirements. Add to the mix increasingly demanding applications, distributed data center environments, legacy business processes that must be supported and nonstandard infrastructure inherited through acquisitions, and you get a gerrymandered architecture comprising many discrete storage resources
2 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data
THE JOURNEY TO AN EFFICIENT ENTERPRISE

Organizations typically pass through five phases as they transform their traditional operations into an IT-as-a-service model:
Self-provision services on demand Aggregate internal and external services
Standardize and consolidate
Virtualize and automate
Become an IT service bureau
that must be managed individually. Such an architecture is disruptive to scale, expensive to own and operate and increasingly difficult and labor-intensive to manage. ITaaS requires a pool of storage thats flexible and fungible. The IT staff must be able to quickly configure storage for a particular need and then just as quickly reconfigure it so it can be used again elsewhere. The storage must be malleable so that capacity can be quickly expanded, data and applications can be easily and securely migrated and workloads can be automatically rebalanced. Applications need to be online 24/7/365, so high availability is paramount. Finally, management of the entire storage pool, as well as coordination with virtualized servers and networking, should be streamlined and simplified.
8 UTONOMIC MANAGEMENT: the capability A

to reconfigure itself, balancing workloads and determining the appropriate tiering of data without manual intervention All companies need to protect their data with solutions that have these characteristicsincorporating technologies such as deduplication, which removes redundant data for better capacity utilization. That effectively enables companies to effectively deal with big data requirements.
/// DATA DEDUPLICATION 2.0

A converged storage strategy can help companies more easily deal with the various storage challenges they face, including the increasing amounts of unstructured data they must manage. Also known as big data, unstructured data includes any data that is not in a structured database format. Thats everything from e-mail to Microsoft Word and PowerPoint documents, to video and audio recordings. The increasing amounts of big data plaguing companies result in one or more of four pain points:
/// THE PATH TO IT AS A SERVICE

Organizations need a strategy for rearchitecting storage so that it enables, rather than constricts, the delivery of IT services. According to HP, its all about Converged Storage, which breaks through the barriers, reducing complexity so that IT can expand storage on a pay-as-you-grow basis. It involves the creation of a pool of storage based on modular building blocks that can be moved and reconfigured on the fly to meet a range of needs. In fact, HPs Converged Storage approach incorporates several core capabilities:
8 hrinking backup windows amid ever-increasing S

amounts of data
8 Increasingly difficult disaster recovery processes,

including use of tape that must be transported to and from a backup site
8 ULTITENANCY: M
the ability to securely host many different applications in a single pool of storage, delivering the appropriate level of resources and performance for each application distribute storage resources and move data among those resources without disrupting user access to that data
8 ights-out data protection requirements for remote L

and branch offices where no IT personnel are onsite
8 EDERATION: the ability to geographically F
8 he need for rapid file restore, which involves T

finding the right tape and matching files with compatible backup systems For all of these reasons, it makes sense for companies to try to reduce the amount of data they store and, hence, have to back up. One way a converged storage infrastructure helps do that is through advanced data deduplication technology.
8 FFICIENCY: the ability to allocate resources E

in the most cost-effective manner through thin provisioning and other techniques
/// DEDUPLICATION 1.0

For several years, storage systems have offered deduplication technology that helps address todays challenges by eliminating duplicate occurrences of data, thus reducing the volume of data that companies must store. These solutions identify where duplicate data exists and then write it only once while creating an index of pointers that indicate where the duplicate blocks should live in various files so they can be rebuilt as needed. Specifically, deduplication goes right to the heart of the four pain points outlined above. The technology improves backup speed by reducing the amount of data that needs to be backed up. The systems can deal with multiple simultaneous streams of backup to a single device at rates of up to 28 TB per hour. Numerous individual backup streams from a range of heterogeneous platforms can be consolidated onto a single disk-based backup devicewhich not only improves performance but also ensures that all backup data is in the same place, regardless of the platform it came from. Deduplication also lowers overall storage costs, by decreasing the amount of data that needs to be backed up and by increasing efficiency. By dramatically reducing the amount of data that needs to be backed upby up to 95 percent in some casesdeduplication makes data replication to a
remote disaster recovery site a practical alternative to using tape-based backups. Once a data set is established at the backup site, only changes to the data need to be replicated over the WAN. Similarly, data restores are much faster, since theyre handled over the WAN and dont involve finding and transporting tapes. And the entire process can be automated, managed centrally via a single pane of glass. All of this dramatically increases the reliability of data backups. Automated backup and disaster recovery also means theres no need for operator intervention at remote sites, since data center staff can handle all tasks. Tape handling can also be eliminated, thus freeing up staffers time at remote sites. Most backup applications can track where data is stored following a replication, making restores faster and easier. And by reducing data volumes, deduplication enables more data to remain on hand locally in near-term storage for even faster file restores. All of this once again results in lower costs by requiring less bandwidth for backups. Additionally, deduplication reduces restore complexity, because all data can be restored from the same backup device, regardless of the platform it came from.
/// NEXT-GENERATION DEDUPLICATION

Whereas first-generation deduplication technology represented a significant step forward in dealing with big data, next-generation products are now emerging that bring even greater benefits, including increased compatibility and availability as well as improved restore performance. Initial deduplication products were focused on various vendors point storage solutions. As such, they achieved deduplication by using different, often incompatible, algorithms. So if data needed to be sent between storage systems, it often needed to be reconstituted and then deduped again on the target system. Next-generation deduplication products, or Dedupe 2.0 systems, use a common deduplication algorithm across all storage systemswhether theyre smaller systems in branch offices or large data center storage facilities. That means no more reconstituting data as it traverses different storage systems, which saves bandwidth and improves performance. First-generation deduplication technology was also focused more on backup performance than restore performance. Thats a growing issue as companies
THE POWER OF DEDUPLICATION SOFTWARE

HP Data Protector software, powered by HP StoreOnce deduplication, enables clients to
8 Maximize IT staff resources through

remote deployment and management of deduplication stores from a central data center
8 Control licensing costs by redeploying deduplication agents on application or backup servers, at no cost to existing customers with Advanced Backup to Disk licenses
8 Reduce compliance risk in a small

or standalone office by automating retention times for different data types and removal of expired data
deal with increasing amounts of big data. The more data thats backed up, the faster restores need to be. Dedupe 2.0 products can deliver restore speeds that are just as fast as backup speeds. Dedupe 2.0 products also deliver high availability, which is increasingly important in helping companies back up more data in the same or shortened backup windows. Under such circumstances, companies cant afford to have a backup process fail at 3 a.m. and require a restart. Some Dedupe 2.0 systems can now be configured to have a backup storage system kick in if a primary system fails, all without operator intervention. That means theres no single point of failure in the backup processa crucial consideration for massive storage systems that have to back up hundreds or thousands of servers on a routine basis.
/// ONE STEP AT A TIME

Many of these Dedupe 2.0 technologies were developed by HP Labs and are now included in the HP StoreOnce family of dedupe appliances. One example is the common deduplication algorithms that enable all systems to deal with deduplication in the same waya concept the technology leader calls federated deduplication. Federation means that data never has to be rehydrated as it passes from one system to another, thus enabling companies to save time and money on backups, since they dont need as much local- or wide-area bandwidth. HP Labs has also developed specialized large container technology that improves data layouts in a storage system and enables restores to occur just as fast as backupsup to 28 TB per hour with the HP B6200 Backup System. That kind of performance is crucial in helping companies meet aggressive recovery time objectives (RTO) after a failure or a disaster. HPs pioneering advancements have resulted in technology with no single point of failure across nodes, controllers, cache, disks, paths, power and cooling, because each node is paired with a partner node that can take over if its companion fails. And, when used with certain backup applications, intelligent storage systems can automatically detect certain failures and take necessary corrective actions, including restartsall without operator intervention. All of that while operating up to twice as fast as most competing systems and with three times the capacity. Companies enjoying the benefits of Dedupe 1.0 technology will immediately see the added value that Dedupe 2.0 products can bring to their big data storage. Those that have yet to introduce deduplication in their environment can skip the 1.0-generation products altogether and immediately garner the benefits next-generation technology brings to a converged storage environment. By using these concepts as a base, organizations can develop an ideal storage platform to support virtual and cloud computing. Indeed, HPs Converged Storage will enable organizations to deploy storage 40 percent faster, reduce the time it takes to deliver IT services from weeks to minutes, reduce energy use and physical space requirements by 50 percent, and cut the time and expense of managing storage systems. n For more information on HPs deduplication technology and products, click here.
1
BIG-TIME TCO SAVINGS

Data deduplication with HP StoreOnce affords multiple opportunities for savings:
8 Using HP Labs innovations such as sparse

indexing means less (as much as 95 percent less) backup data stored on disk. Such algorithms, combined with the cost-effectiveness of the HP storage appliances that include HP StoreOnce technology, delivers a superior solution at appreciably lower cost than comparable competitive offerings.
8 HP customer studies have shown that

HP StoreOnce backup systems generate 50 percent TCO savings versus a traditional backup infrastructurewith the additional benefit of the faster recovery that disk-based backup provides.1
8 HP StoreOnce allows for faster, more

cost-effective configuration. According to an independent evaluation conducted in 2010 by the Evaluator Group, HP StoreOnce deduplication technology required fewer steps to configure in one situation, just 11 steps, versus 33 steps for the competition. It did not even require looking at the manuals, according to the Evaluator Group.2
Source: Complete Storage and Data Protection Architecture for VMware vSphere, 2011, HP. http://h20195.www2.hp.com/v2/ getdocument.aspx?docname=4AA3-5141ENW.pdf
2
Source: Top 10 Reasons Why You Should Choose HP StoreOnce solution brief, 2010. http://h20195.www2.hp.com/V2/ GetPDF.aspx/4AA3-2347ENW.pdf
5 ///
HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data
Suggested Reading
These additional resources include business white papers and previously published articles from IDG Enterprise.
////////////
////////////
Extend your data centers life expectancy

Companies can extend the life of their data centers by two to five years through a combination of IT strategies
By Sandra Gittlen Computerworld This year marks the 10th anniversary of the 1,200-square-foot data center at the Franklin W. Olin College of Engineering -- that means the facility has been operating three years longer than CIO and vice president of operations Joanne Kossuth had originally planned. Now, even though the school needs a facility with more capacity and better connectivity, Kossuth has been forced to set aside the issue because of the iffy economic times. Demand has certainly increased over the years, pushing the data center to its limits, but the recession has tabled revamp discussions, she says. Like many of her peers, including leaders at Citigroup and Marriott International, Kossuth has had to get creative to eke more out of servers, storage, and the facility itself. To do so, shes had to re-examine the life cycle of data and applications, storage array layouts, rack architectures, server utilization, orphaned devices and more. Rakesh Kumar, research vice president at Gartner, says hes been bombarded by large organizations looking for ways to avoid the cost of a data center upgrade, expansion or relocation. Any data center investment costs at minimum tens of millions, if not hundreds of millions, of dollars. With a typical data center refresh rate of five to 10 years, thats a lot of money, so companies are looking for alternatives, he says. While that outlook might seem gloomy, Kumar finds that many companies can extract an extra two to five years from their data center by employing a combination of strategies, including consolidating and rationalizing hardware and software usage; Read the full article
Recoup with data dedupe

Eight products that cut storage costs through data deduplication
By Logan G. Harbaugh Network World Backing up servers and workstations to tape can be a cumbersome process, and restoring data from tape even more so. While backing up to disk-based storage is faster and easier, and probably more reliable, it can also be more expensive. One way to get the best of both worlds is to back up to disk-based storage that uses deduplication, which increases efficiency by only storing one copy of a thing. While the process was originally used at the file level, many products now work at the block or sub-block (chunk) level, which means that even files that are mostly the same can be deduplicated, saving the space consumed by the parts that are the same. For instance, say someone opens a document and makes a few changes, then sends the new version to a dozen people. With file-level deduplication, the old and new versions are different files, though only one copy of the new version is stored. With block-level or sub-block-level deduplication, only the first document and the changes between the first document and the second are stored. There is some debate about the optimum process deduplication of files is not very efficient, blocks, more so, chunks even more so. However, the smaller the chunks, the more processing it takes, and the bigger the indices are that keep track of duplicates. Some systems use variable size chunks to tune this, depending on the type of data being stored. The good news is that deduplication works well - in our tests, all of the products were able to create a second copy of a volume and use less than 1% additional space, and to back up a copy of the test volume with 4,552 files changed totaling 31.7 GB and use no more than 32GB of additional space, and in some cases a Read the full article
6 ///
HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data
Suggested Reading
////////////
////////////
The Emergence of a New Generation of Deduplication Solutions:

Comparing HP StoreOnce vs. EMC Data Domain
By Edison Group The HP StoreOnce deduplication technology, launched in 2010, helps IT organizations address the challenge of protecting and recovering exponentially growing amounts of data in the face of stagnant or incrementally increasing IT budgets. On November 29, HP launched the latest iteration of its StoreOnce deduplication portfolio. The HP B6200 StoreOnce Backup System provides enterprise-class scale-out capabilities and autonomic restart of backup jobs for high data availability. The autonomic restart feature, an important differentiator, is designed to eliminate failed backups by pairing nodes within a couplet there are two nodes in each couplet so the surviving node can take over when its companion node fails. To help current and potential customers understand the value of the appliance and the StoreOnce strategy, technology research firm Edison Group compared HPs B6200 StoreOnce offering to its nearest competitor, EMC Data Domain 890 and Data Domain Global Deduplication Array. Edison considered a number of criteria that are of critical concern to todays data center IT managers in evaluating products. These include scalability (including capacity and performance), high availability, architectural approach, pricing, and licensing. In the course of its research Edison found that HP B6200 StoreOnce meets, and in many cases exceeds, Data Domains published specifications. Notably, Edison also found the HP B6200 StoreOnce to be the only enterprise-class deduplication appliance to offer an autonomic restart feature, which provides industryleading availability for big-data backups. Read the full article
HP StoreOnce: The Next Wave of Data Deduplication

By Enterprise Strategy Group Leveraging deduplication in backup environments yields significant advantages. The cost savings in reducing disk capacity requirements change the economics of disk-based backup. For some organizations, it allows disk-based backupand, importantly, recoveryto be extended to additional workloads in the environment. For others, deduplication makes it possible to introduce disk-based backup where it may not have been feasible before. Deduplication in data protection is not new; however, it is being implemented in new ways. Its availability in secondary disk storage systems was the predominant delivery vehicle just a few years ago. Today, the technology is available as an integrated feature of backup software, cloud gateway, and software-as-a-service (SaaS) solutions, delivering bandwidth savings in addition to reduced storage capacity benefits. In addition to distributing deduplication processing across multiple points in the backup data path, there are many more deduplication techniques and approaches today too. Vendors are perfecting and optimizing algorithms that identify and eliminate redundancy to meet the ever-changing requirements driven by relentless data growth and ITs desire to keep pace with the volume of data under management. The evolution of deduplication is being provoked by user requirements, as well as improvements in IT infrastructure, including larger, faster disk drives, and APIs facilitating better integration between data protection hardware and software components. IT organizations that have or plan to implement deduplication want greater flexibility in how and where deduplication is deployed, tighter integration with the backup policy engine and backup catalog, faster performance for backup and recovery, the ability to deduplicate within and across domains to gain more efficiency. And, they want it for the lowest cost possible. Read the full article
4AA3-9132ENW

Advances in Deduplication Help Tame Big Data

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advances in Deduplication Help Tame Big Data

Uploaded by

Copyright:

Available Formats

/// IDG Tech Dossier

Advances in Deduplication Help Tame Big Data

CONVERGENCE HELPS ORGANIZATIONS MASTER THE ART OF DEDUPLICATION

/// THE LIMITS OF

2 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

THE JOURNEY TO AN EFFICIENT ENTERPRISE

Standardize and consolidate

Virtualize and automate

Become an IT service bureau

8 UTONOMIC MANAGEMENT: the capability A

/// DATA DEDUPLICATION 2.0

/// THE PATH TO IT AS A SERVICE

8 hrinking backup windows amid ever-increasing S

8 Increasingly difficult disaster recovery processes,

8 ights-out data protection requirements for remote L

8 EDERATION: the ability to geographically F

8 he need for rapid file restore, which involves T

8 FFICIENCY: the ability to allocate resources E

3 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

/// DEDUPLICATION 1.0

/// NEXT-GENERATION DEDUPLICATION

THE POWER OF DEDUPLICATION SOFTWARE

8 Maximize IT staff resources through

8 Reduce compliance risk in a small

4 /// HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

/// ONE STEP AT A TIME

BIG-TIME TCO SAVINGS

8 Using HP Labs innovations such as sparse

8 HP customer studies have shown that

8 HP StoreOnce allows for faster, more

HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

Extend your data centers life expectancy

Recoup with data dedupe

HP CONVERGED STORAGE: Advances in Deduplication Help Tame Big Data

The Emergence of a New Generation of Deduplication Solutions:

HP StoreOnce: The Next Wave of Data Deduplication

You might also like