Professional Documents
Culture Documents
Redguides
for Business Leaders
Phil Francisco
Executive overview
Success in any enterprise depends on having the best available information in time to make sound
decisions. Anything less wastes opportunities, costs time and resources, and can even put the
organization at risk. But finding crucial information to guide the best possible actions can mean
analyzing billions of data points and petabytes of data, whether to predict an outcome, identify a trend,
or chart the best course through a sea of ambiguity. Companies with this type of intelligence on demand
react faster and make better decisions than their competitors.
This IBM Redguide publication introduces the Asymmetric Massively Parallel Processing
(AMPP) architecture, and describes how the system orchestrates queries and analytics to
achieve its unprecedented speed. You will see how software and hardware come together to
extract the maximum utilization from every critical component, and how a system optimized
for thousands of users querying huge data volumes really works. It is a unique data
warehouse and analytics platform with unparalleled price-performance, ready for today's
needs and tomorrow's challenges.
Architectural principles
Netezza technology integrates database, processing, and storage in a compact system
optimized for analytical processing and designed for flexible growth. The system architecture
is based on the following core tenets that have been a hallmark of Netezza technology
leadership in the industry:
of any complexity on stream against huge data volumes eliminates the delays and costs
incurred moving data to separate hardware. It accelerates performance by orders of
magnitude, making the PureData System for Analytics the ideal platform to converge data
warehousing with advanced analytics.
Appliance simplicity
By automating and streamlining day-to-day operations, the Netezza architecture shields
users from the underlying complexity of the platform. Simplicity rules whenever there is a
design tradeoff with any other aspect of the appliance. Unlike other solutions, it just runs,
handling demanding queries and mixed workloads with blistering speed, without the tuning
required by other systems. Even normally time-consuming tasks such as installation,
upgrades, and ensuring high availability and business continuity are vastly simplified, saving
precious time and resources.
operates on multiple data streams, filtering out extraneous data as early as possible. More
than a thousand of these customized MPP streams work together to divide and conquer the
workload.
FPGA
CPU
Memory
FPGA
Advanced
Analytics
BI
CPU
Host
Host
Memory
ETL
FPGA
CPU
Loader
Memory
Disk
Enclosures
S-Blades
Network
Fabric
traffic. The network is optimized to scale to more than a thousand nodes, while allowing
each node to initiate large data transfers to every other node simultaneously.
Note: All system components are redundant. While the hosts are active-passive, all
other components in the appliance are hot swappable. User data is fully mirrored,
enabling better than 99.99% availability.
Memory
FPGA
CPU
NIC
Memory
FPGA
CPU
NIC
Memory
FPGA
CPU
NIC
Host
Host
Compress
Memory
D
M
A
CPU
NIC
Project
Restrict
FPGA
Execution Engine
FAST Engines
Scheduling
Query Analysis
Scheduler
Compiler
Object Cache
Optimizer
Execution Engine
FAST Engines
System Catalog
Execution Engine
FAST Engines
Disk
Enclosures
S-Blades
Network
Fabric
Netezza Host
Convert it to snippets
The compiler converts the query plan into executable code segments, called snippets, which
are query segments executed by Snippet Processors in parallel across all the data streams in
the appliance. Each snippet has two elements: compiled code executed by individual CPU
cores and a set of FPGA parameters to customize the FAST engines' filtering for that
particular snippet. This snippet-by-snippet customization allows the PureData System for
Analytics to provide, in effect, a hardware configuration optimized on the fly for individual
queries.
Intelligence in the compiler (the object cache): The host uses a feature called the
object cache to further accelerate query performance. This is a large cache of previously
compiled snippet code that supports parameter variations. For example, a snippet with the
clause, where name = bob might use the same compiled code as a snippet with the
clause, where name = jim but with settings that reflect the different name. This approach
eliminates the compilation step for over 99% of snippets.
Query 1
Disk
Disk
Memory
Query N
Disk
Network
Disk
Memory
Network
Network
Network
Memory
Summary
The best solutions are not necessarily the biggest or most expensive, they are the ones that
have the smartest design. The PureData System for Analytics exploits the inherent advantage
that streaming processing provides over the traditional computing architectures used by other
analytic and data warehousing systems. The result is a compact appliance with performance
dwarfing that of much larger systems, with blinding speed for running complex algorithms
against huge data volumes and the mixed workloads created by thousands of concurrent
users. Processing performance is complemented by other capabilities that make IBMs
solution a unique platform to help businesses succeed, including:
Simplicity of use
The PureData System for Analytics is self-managed, as an appliance should be, and is
always running at its peak throughput. The system software ensures that without human
intervention.
Better decisions across the enterprise
10
Embedded functions bring a new generation of analytics into the database with minimum
development effort. There is no need for separate server hardware or time lost in massive
data transfers just lightning-fast results and the ability to bring crucial business
intelligence to everyone who could benefit, in all sectors of an organization.
Agility for the future
The system is built not just for today's challenges, but for years to come, scaling linearly to
petabytes of user data and with performance acceleration far beyond the conventional
speed-up governed by Moore's Law.
PureData System for Analytics allows you and your company to make decisions with
maximum clarity while taking performance for granted. But do not just take our word for it. The
best way to appreciate PureData System for Analytics is to see it in action. We think you will
agree there is simply nothing else like it for making the most of your data.
11
12
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
13
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corporation in the United States, other countries, or
both. These and other IBM trademarked terms are marked on their first occurrence in
this information with the appropriate symbol ( or ), indicating US registered or
common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at
http://www.ibm.com/legal/copytrade.shtml
Redbooks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Redbooks (logo)
IBM PureData
IBM
PureData
PureSystems
Redbooks
Redguide
Netezza, and N logo are trademarks or registered trademarks of Netezza Corporation, an IBM Company.
The following terms are trademarks of other companies:
Netezza, and N logo are trademarks or registered trademarks of IBM International Group B.V., an IBM
Company.
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
14