You are on page 1of 9

Introduction to SAP HANA

Much has been written about SAP HANA. The technology has been variously described as

transformative and wacko. Well, which is it? Disclosures I have a few disclosures to make before I continue my analysis and comments on Hana: 1. I worked at SAP for six years, as well as eight years at Oracle (plus also at Ingres before that). 2. I was at SAP when the technology underlying HANA was acquired, though I am referring to and using no trade secrets or proprietary information in preparing this analysis. 3. I attended this years SAPPHIRE conference in Orlando, and SAP paid for my airfare and hotel. Relational Databases Relational databases have dominated the commercial information processing world for twenty years or more. There are many good reasons for this success. 1. Relational databases are suitable for a broad range of applications. 2. Relational databases can enable access to data relatively efficiently even if the query was not initially envisioned when the database was designed. 3. Todays relational databases are economical, available on a broad range of hardware and operating systems, generally compatible across vendors, performant for many queries, scalable to fairly large data volumes without resorting to partitioning, suitable for partitioning when larger scale is required, based on open standards, mature, and stable. 4. There are a large number of developers, administrators, designers, and an ecosystem of service providers who are very knowledgeable about todays popular relational databases, and who are available at economic rates of pay. NoSQL, Columnar, and In-Memeory Trend There is an emerging trend towards databases that are designed to solve specific problems. While relational databases are good for solving many problems, it is easy to conceive of specific problems that are not well-solved by general-purpose databases. Relational databases are well-

suited to handling structured data where the schema does not change, where text processing is not an important requirement, where data is measured in gigabytes rather than petabytes, where geographical or time-series (e.g., stream) processing is not required, and where the server does not need to support transactional and decision-support queries simultaneously. Some problems do not fit those criteria. The data set is such that the schema varies from record to record, or over time. Text, image, blob, or geographical data may be a dominant data type. More and more frequently, applications manage big data, or huge volumes of data from millions of users or sensors. Some applications require simultaneous access to data for transactional updates as well as for aggregation in decision-support queries. For all of these cases, advanced architects and developers are looking at specialized data stores and data processing systems such as Hadoop, Cassandra, MongoDB, and others. These domain-specific data stores are known as NoSQL databases. There is some controversy over whether NoSQL means no SQL or Not Only SQL. Regardless, those non-relational stores such as Hadoop, are growing in popularity, but are not really a replacement for relational data stores. A key property of most commercial relational databases is their compliance with a principle called ACID, which essentially guarantees that database transactions occur in a reliable way. Many NoSQL databases use techniques like eventual consistency to improve performance at the cost of inconsistent data a sacrifice that is unsuitable for most business applications. After all, if you deposit money in a bank account, you want it to be available for withdrawal right away, not eventually.

Another trend in the database world is towards new methods of storing data, without eliminating the ACID properties that business applications need, and without sacrificing the SQL language that is so well-known and widely supported. Two specific approaches are quite popular these days columnar storage and in-memory databases. Column stores, such as HPs Vertica or SAP Sybase IQ, store data by column. By contrast, traditional SQL databases store data as rows. The benefit of storing data as rows is that it is often the fastest way to look up a single value, such as salary, given a key value like the employee ID. Columnar databases group data by column. Within a column, generally speaking, all the data is of the same type. A columnar store, therefore, stores data of a single type all together, which can give advantages such as the possibility for significant compression. Good compression can lead to reduced disk space requirements, memory requirements, and access times.

In-memory databases take advantage of two hardware trends: a significant reduction in the cost of RAM, and a significant increase in the amount of addressable memory in todays computers. It is possible, and economically feasible, to put an entire database in memory, for fast data management and query. Using columnar or other compression approaches, even larger data sets can be loaded entirely into main memory. With high-speed access to memory-resident data, more users can be supported on a single machine. Also, with an in-memory database, both transactional and decision-support queries can be supported on a single machine, meaning that there can be zero latency between data appearing in the system, and that data being available to decision-support applications; in a traditional set-up where data resides in the operational store, and then is extracted into a data warehouse for reporting and analysis, there is always a lag between data capture and its availability for data analysis. SAP HANA Several years ago, SAP acquired Transactions In Memory, a company that had developed an inmemory database. Over the years since, at virtually each annual SAPPHIRE conference, SAP has discussed how this in-memory technology would revolutionize business computing, but I personally found the explanations to be somewhat short on convincing details.

Even the name, HANA, has changed in meaning over the years. Initially, the name stood for Hassos New Architecture (and a beautiful vacation spot in Maui, Hawaii) and referred only to the software. Today, HANA stands for High-Performance Analytical Appliance, and refers to the software and the hardware appliance on which it is shipped. In addition, HANA has evolved from a data warehousing database into a more general purpose platform. SAP HANA does manage data in memory, for nearly incredible performance in some applications, but it also manages to persist that data on disk, making it suitable for analytical applications and transactional applications simultaneously. But HANAs capabilities do not end there, and that may be the key to HANAs long-term value.

In the short-term, it seems that SAP still struggles to generate references for HANA, other than in a narrow set of custom data-warehouse-type analytics. That may obscure where HANA can really deliver its first market successes. When HANA is generally available, it is expected to include both SQL and MDX interfaces, meaning that it can be easily dropped into Business Objects environments to dramatically improve performance. Some Business Objects analyses, whether in the Business Objects client or in Excel, can achieve orders of magnitude of performance improvement, with very little effort. Imagine reports that used to take a minute to run now running instantaneously. Imagine the satisfaction of your BOBJ user community if all or most of their reports and analysis ran instantaneously. Line-of-business users will pay for this capability, and that will open the door for SAP HANA in Business Objects accounts. After HANA gets in the door, Im sure the CIO will find tons of additional uses for it. This is huge, and will generate truckloads of money for SAP, while also making customers super-satisfied.

And think of what SAP HANA means for competitive comparisons with Oracle, SAPs maximum enemy. Larry wants to sell you Exalogic and Exadata

machines, costing millions; Hasso wants to sell you a simple, low-end, commodity device delivering the same benefits. If I were SAP, Id have sales reps with HANA software installed on their laptops, demonstrating it at every customer interaction, and comparing it (favorably) with Oracle Exadata, and suggesting that customers demand that Oracle sales reps bring in an Exadata box on their next sales call and not to bother showing up without one. Larry wants to sell you a cloud in a box; SAP will sell you apps on the cloud, or analytics in a box for hundreds or a thousand times lower cost than Oracles solution. The longer term benefits of HANA will require new software to be written software that takes advantage of objects managed in main memory, and with logic pushed down into the HANA layer. Ill post more on this potential in the future, but just think of what instantaneous processing of enormous data sets will mean to business continuous supply chain optimization, real-time pricing, automated and excellent customer service, and much more. Summary In the long run, SAP HANA may indeed revolutionize enterprise business applications, but that remains to be seen. Right now, SAP HANA should be capable of creating substantial customer benefits and generating a very large revenue stream to SAP.

SAP HANA - Overview and Architecture


HANA is High-Performance Analytic Appliance is an in memory appliance for SAP systems. Below are the notes/highlights of HANA for the webinar I attended recently.

Overview and Architecture of HANA


What is HANA ? - In memory computing engine In memory computing studio as a fronend for modleing and administration. HANA is connected ERP systems, Frontend modeling studio can be used for load control and replication server management Two types of Relational Data stores in HANA : Row Store, Column Store SAP BOBJ tools can directly report HANA Data from HANA can also be used in MS Excel Row Store Traditional Relational Database , the difference is that all the rows are in memory in HANA where as they are stored in a hard drive in traditional databases. Column Store The data is stored in columns like in SAP BWA Persistency Layer: In memory is great by it is volatile and data can be lost with power outage or hardware failures. To avoid this HANA has a Persistencey Layer component which makes sure that all the data in memory is also store in a hard drive which is not volatile Session Management: This component takes care of logon services Two processing engines Well, data is in memory which is good but How do I extract/report on the data? HANA has two processing engines one is based on SQL which accepst SQL queres and the other one is based on MDX .

HANA Supports Sybase Replication Server Sybase Replication Server can be used for realtime synchronization of data between ERP and HANA

Modeling Studio
Using Modeling Studio you can,

Specify which tables are stored in HANA, first part is to get the meta data and then schedule data replication jobs Manage Data Services to load the data from SAP BW and other 3rd party systems. Manage connections to ERP instances, current release does not support connecting to several ERP instances Use Dataservices to for the modeling Do modeling in HANA itself (This is independent of Dataservices). You can also do modeling can also be done in Business Objects Universes which is nothing but joining fact and dimensional tables.

Reporting

Client tools can access HANA directly, Like MS EXCEL, SAP BI 4.0 Reporting tools, Dashboard Design Tool (Xcelsius)etc can also access HANA directly. Third party reporting tools can leverage ODBC, JDBC and ODBO (for MDX requests) drivers in HANA for reporint. HANA supports BICS interface

Request Processing and Execution Control


SQL Script, MDX statemenst are passed to calculation modles. Optiomizer which is included in caluculation engine optimizes for better performance. Calc Engine : o Modeler can define data sources as inputs and different operations (join, aggreagation, projection) on top of them for data manipulation o The calc engine will break up a model into sub processes for optimized performance on cost based. o System will use maximum resources to achive max through put Planning Enigne : Will be included in next release. Will include planning functions like distribute and copy functions.

ROW Store

One of the relational engines to store data in row format. o Pure in-memory store (Future versions will also have an option of disk based store) o In memory object store (in future) for live cache functionality

Transactions Version Memory is the heart of row store Row store architecture o Write operation mainly go into "Transactional Version Memory" o INSERT also writes to persisted segment o Moves visible version from memory to persisted segment o Clears outdated record versions from Transactional Version memory o Row Store tables have a primary index o Row ID maps to primary key o Secondary indexes can be created o Row ID contains the segment and the page for the record o Indexes in row store only exist in memory o Index definition stored with table meta

Column Store

Improves read functionality significantly, also improves write functionality Highly compressed data No real files, virtual files Optimizer and Executer Handles queries and execution plan Delta data for fast write Asynchronous delta merge Consistent view Manager Main store compressed and read optimized Data is read from Main Store Delta Store Write optimized for write operations. Asynchronous merge move the data from delta store to main store Compression by create dictionary and applying further compression methods Even during the merge operation, the columnar table will still be available for read and write operations. To fulfil this, a second delta and main storage are used internally Merge operation can also be triggered manually with an SQL command

Persistence Layer

Peristence Layer is needed as Main memory is volatile Provides Backup and Restore functionality One Persistency Layer takes care of both row and column stores Regular Save Points Logs capturing DB transactions since last save point Actions during system restart o Last savepoint must be restored plus undo logs must be read and uncommitted atransactions saved with last save point and apply redo logs o Complete content of row store is loaded into memory during start procees o Flags can be set for column store to specify which tables are loaded during system restart

Modeling

Modeling only possible for Column tables Information Modeler only works for column tables Replication servers create tables in column store per default Data Services creates tables in column store per default SQL to create column table: Create COLUMN TABLE Store can changed with ALTER TABLE System tables are create where they fit best Schema SYS -> chaces, administrative table of engine Tables from stastics server

In-Memory Computing Studio


Build with java based eclipse Navigator to access different HANA systems on left, Quick Launch View at the middle and Properties view at the bottom. Information Modler Features: o Database views o Choice to publish anc consume at 4 levels of modeling Attribute view, analytic view ... o Physical tables and Information Models o Import/export models, data source shcemas, mass and selective load o Landscapes The models are just virtual definitions they don't store actual data Analytic Views are like cube model where Transaction Data is connected to attribute view Calc View With custom functions and calculations Modeling Process Flow o Import Source System Metadata o Create Information Models o Consume using BICS, SQL or MDX Infromation Modeler Terminology o Attributes Characteristics o Measure Key Figures o Attribute Views Dimentions o Analytic Views Cubes o Calculation Views Similar to Virtual provider concept in BW o Hierarcheis Leveled based on multiple attributes Parent-child hierarchy o Analytic Privilege Security Object Navigation View o HANA instance -> Hana srver name and instance numbe -> user database schema -> views functions and tables

Information Modles Attribute, Analytic, Calculation Views and Analytic Previlege Attribute View : o Attributes add context to data o Attributes are modeled using attributes views o Can be regarded as Master Data Tables o Can be linked to fact tables in Analytical Views o A measure e.g. weight can be defined as an attributes o Table Joins and properties Leftouter,rightouter, full outer or text table Cardinality 1:1, N:1, 1:N Language Column o Content Views and Functions will be shipped with HANA Analytics View: o Similar to Cube o Analytic Views does not sotre any data. The data is stored in column store table or view based on Analytical View structure o Attributes and Measures Like key figures Data Preive Similar to listcube functionality Calculation View: o Define table outpu Structure o Write SQL statement Ensure the selected fields correspons to previously defined output structes o SQL Scripts unlike SQL procedure can't change any data they are read only
o

Other Notes:

External tools can connect to HANA using JDBC and ODBC drivers HANA currently doesn't support complete MDX set, it supports EXCEL 2010 standard MDX BWA Hardware can be upgraded to HANA given that hardware is relatively new BWA Licences can be transferred to HANA ERP and BW can be connected to HANA using Dataservices or Sybase Replication No name space concept in HANA at this moment, so two ERP instances can't be connected to HANA. However this issue can be avoided using datasources CRM also can use HANA Data from BW into HANA can be loaded into using Data Services and Infospokes More to Read:

You might also like