You are on page 1of 6

International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.

2, Issue 2 June 2012 96-101 TJPRC Pvt. Ltd.,

A COMPARATIVE STUDY OF PARTITIONING TECHNIQUES IN VLDB


1

C. SUBRAMANIAN, 2T. BHUVANESWARI & 3S.P. RAJAGOPALAN Research scholar, Dr.MGR Educational & Research Institute University, Chennai, Tamil Nadu,India 2 Assistant Professor, Govt. Arts & Science College for Women, Bargur,Tamil Nadu, India
3

Professor Emeritus, Department of Computer Applications, Dr. MGR Educational & Research Institute University, Chennai, Tamil Nadu, India .

ABSTRACT
While there is apparently no official or standard definition for the term Very Large Database (VLDB), it is sometimes used to describe databases occupying magnetic storage in the terabyte range and containing billions of table rows. Typically, these are decision support systems or transaction processing applications serving large numbers of users. It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information. Partitioning enhances the performance, manageability, and availability of a wide variety of applications and helps reduce the total cost of ownership for storing large amounts of data.

KEYWORDS: VLDB, Partition, Issues, DBMS. INTRODUCTION


Modern enterprises frequently run mission-critical databases containing upwards of several hundred gigabytes and, in many cases, several terabytes of data. These enterprises are challenged by the support and maintenance requirements of very large databases (VLDB), and must devise methods to meet those challenges. It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information. It is sometimes used to describe databases occupying magnetic storage in the terabyte range and containing billions of table rows. Typically, these are decision support systems or transaction processing applications serving large numbers of users.

ISSUES WITH VLDB


Backing up the database. With a VLDB, a daily backup of everything via RMAN or Hot Backup is simply not possible, as it cant run the backup in 24 hours. It is required to use hardware such as mirror splitting or deltas. Performance. It is important to consider radical changes such as removing RI or designing around full table scans and ignoring the block buffer cache for the largest tables. The number or size of objects starts causing bits of Oracle to break or work less efficiently (so many tables it takes 2 minutes to select them all or it will hit an unexpected limit like the 2TB disk size in ASM, because it is need to use bigger disc sizes).

97

A Comparative Study of Partitioning Techniques in VLDB

Maintenance tasks become a challenge in their own right. This could be stats gathering, it could be adding columns to a table, it could be recreating global indexes, all of which now take more time than it can schedule the maintenance windows {so part of the definition of a VLDB could be down to how active a database is and how small the windows are 1TB could be a VLDB).

PARTITIONING
Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions, which are entirely transparent to an application. SQL queries and DML statements do not need to be modified in order to access partitioned tables. However, after partitions are defined, DDL statements can access and manipulate individual partitions rather than entire tables or indexes. This is how partitioning can simplify the manageability of large database objects. Each partition of a table or index must have the same logical attributes, such as column names, data types, and constraints, but each partition can have separate physical attributes such as compression enabled or disabled, physical storage settings, and tablespaces. Partitioning is useful for many different types of applications, particularly applications that manage large volumes of data. OLTP systems often benefit from improvements in manageability and availability, while data warehousing systems benefit from performance and manageability.

Partitioning offers these advantages


It enables data management operations such as data loads, index creation and rebuilding, and backup/recovery at the partition level, rather than on the entire table. This results in significantly reduced times for these operations. It improves query performance. In many cases, the results of a query can be achieved by accessing a subset of partitions, rather than the entire table. For some queries, this technique

(called partition pruning) can provide order-of-magnitude gains in performance. It significantly reduces the impact of scheduled downtime for maintenance operations. Partition independence for partition maintenance operations lets you perform concurrent maintenance operations on different partitions of the same table or index. You can also run concurrent SELECT and DML operations against partitions that are unaffected by maintenance operations. It increases the availability of mission-critical databases if critical tables and indexes are divided into partitions to reduce the maintenance windows, recovery times, and impact of failures. Parallel execution provides specific advantages to optimize resource utilization, and minimize execution time. Parallel execution against partitioned objects is key for scalability in a clustered environment. Parallel execution is supported for queries as well as for DML and DDL. Partitioning enables faster data access within an Oracle database. Whether a database has 10 GB or 10 TB of data, partitioning can improve data access by orders of magnitude. Partitioning can be

C. Subramanian,T. Bhuvaneswari & S.P. Rajagopalan

98

implemented without requiring any modifications to your applications. For example, for converting a nonpartitioned table to a partitioned table without needing to modify any of the SELECT statements or DML statements which access that table. It is not necessary to rewrite the application code to take advantage of partitioning.

VLDB AND PARTITIONING


A very large database has no minimum absolute size. Although a VLDB is a database like smaller databases, there are specific challenges in managing a VLDB. These challenges are related to the sheer size, and the cost-effectiveness of performing operations against a system that size, taken for granted on smaller databases. Several trends have been responsible for the steady growth in database size: For a long time, systems have been developed in isolation. Companies have started to see the benefits of combining these systems to enable cross-departmental analysis while reducing system maintenance costs. Consolidation of databases and applications is a key factor in the ongoing growth of database size. Many companies face regulations that set specific requirements for storing data for a minimum amount of time. The regulations generally result in more data being stored for longer periods of time. Companies grow organically and through mergers and acquisitions, causing the amount of generated and processed data to increase. At the same time, the user population that relies on the database for daily activities increases. Partitioning is a critical feature for managing very large databases. Growth is the basic challenge that partitioning addresses for very large databases, and partitioning enables a "divide and conquer" technique for managing the tables and indexes in the database, especially as those tables and indexes grow. Partitioning is the feature that allows a database to scale for very large datasets while maintaining consistent performance, without unduly increasing administrative or hardware resources. The benefits of partitioning are not just for very large databases; every database, even small databases, can benefit from partitioning. While partitioning is a necessity for the largest databases in the world, partitioning is obviously beneficial for the smaller database as well. Even a database whose size is measured in megabytes will see the same type of performance and manageability benefits from partitioning as the largest multi-terabyte systems. A partition is a division of a logical database or its constituting elements into distinct independent parts. Database portioning is normally done for manageability, performance or availability reasons.

99

A Comparative Study of Partitioning Techniques in VLDB

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


Each partition may be spread over multiple nodes, and users at the node can perform local transactions on the partition. This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security. PARTITIONING METHODS Range Partitioning Range partitioning maps data to partitions based on ranges of partition key values that you establish for each partition. It is the most common type of partitioning and is often used with dates. For example, you might want to partition sales data into monthly partitions. Range partitioning maps rows to partitions based on ranges of column values. Range partitioning is defined by the partitioning specification for a table or index in PARTITION BY RANGE (column_list) and by the partitioning specifications for each individual partition

in VALUES LESS THAN (value_list), where column_list is an ordered list of columns that determines the partition to which a row or an index entry belongs. These columns are called the partitioning columns. The values in the partitioning columns of a particular row constitute that row's partitioning key. An ordered list of values for the columns in the column list is called a value list. Each value must be either a literal or a TO_DATE or RPAD function with constant arguments. Only

the VALUES LESS THAN clause is allowed. This clause specifies a non-inclusive upper bound for the partitions. All partitions, except the first, have an implicit low value specified by

the VALUES LESS THAN literal on the previous partition. Any binary values of the partition key equal to or higher than this literal are added to the next higher partition. Highest partition being where MAXVALUE literal is defined. Keyword, MAXVALUE, represents a virtual infinite value that sorts higher than any other value for the data type, including the null value. Hash Partitioning Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to a partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size. Hash partitioning is the ideal method for distributing data evenly across devices. Hash partitioning is also an easy-to-use alternative to range partitioning, especially when the data to be partitioned is not historical. List Partitioning List partitioning enables you to explicitly control how rows map to partitions. You do this by specifying a list of discrete values for the partitioning column in the description for each partition. This is different from range partitioning, where a range of values is associated with a partition and with hash partitioning, where you have no control of the row-to-partition mapping. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way.

C. Subramanian,T. Bhuvaneswari & S.P. Rajagopalan

100

Composite Partitioning Composite partitioning combines range and hash or list partitioning. Oracle Database first distributes data into partitions according to boundaries established by the partition ranges. Then, for range-hash partitioning, Oracle uses a hashing algorithm to further divide the data into subpartitions within each range partition. Horizontal partitioning This involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers. Vertical partitioning This involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device, for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called "row splitting" (the row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.

CONCLUSIONS
In this research article, the partition in very large database, is having different methods and criteria. Considering the factors and experiments, the range partition and hash partition gives the overall performance in information retrieval from large databases. This can be recommended for arriving the maximum requirements of the end users.

REFERENCES
1. A.P. de Vries, Mirror: Multimedia query processing in extensible databases, In Proceedings of the fourteenth Twente workshop on language technology (TWLT14): Language Technology in Multimedia Information Retrieval, pages 3748, Enschede, The Netherlands, December 1998. 2. A.P. de Vries and H.M. Blanken. Database technology and the management of multimedia data in Mirror. In Multimedia Storage and Archiving Systems III, volume 3527 of Proceedings of SPIE, pages 443455, Boston MA, November 1998. 3. A.P. de Vries and H.M. Blanken. The relationship between IR and multimedia databases. In The 20th IRSG colloquium: discovering new worlds of IR, Grenoble, France, March 1998.

101

A Comparative Study of Partitioning Techniques in VLDB

4.

A. Hampapur and R. Jain. Multimedia data management. Using metadata to integrate and apply digital media, chapter Video data management systems: metadata and architecture, pages 245286. In Sheth and Klas [SK98], 1998.

5.

E. Remias, G. Sheikholeslami, and A. Zhang. Blockoriented image decomposition and retrieval in image database systems. In The 1996 International Workshop on Multimedia Database Management Systems, Blue Mountain Lake, New York, August 1996.

6.

I. Mani, D. House, M. Maybury, and M. Green. Intelligent multimedia information retrieval, chapter Towards contentbased browsing of broadcast news video, pages 241258. AAAI Press/MIT Press, 1997.

7.

Nita Goyal, Charles Hoch, Ravi Krishnamurthy, Brian Meckler, and Michael Suckow. Is gui programming a database research problem? In H. V. Jagadish and Inderpal Singh Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 46, 1996, pages 517528. ACM Press, 1996

8.

P.A. Boncz, S. Manegold, and M.L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of 25th International Conference on Very Large Databases (VLDB 99), Edinburgh, Scotland, UK,September 1999.

9.

S. Boll,W. Klas, and A. Sheth. Multimedia data management. Using metadata to integrate and apply digital media, chapter Overview on using metadata to manage multimedia data, pages 124. In Sheth and Klas [SK98], 1998.

10. Thomas V. Papathomas, Tiffany E. Conway, Ingemar J. Cox, Joumana Ghosn, Matt L. Miller, Thomas P. Minka, , and Peter N. Yianilos. Psychophysical studies of the performance of an image database retrieval system. In Proc. SPIE, 1998. 11. W. Greiff, W.B. Croft, and H. Turtle. PIC matrices: A computationally tractable class of probabilistic query operators. Technical Report IR132, The Center for Intelligent Information Retrieval, 1998. submitted to ACM TOIS. 12. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, and C. Faloutsos. The QBIC project: querying images by content using color, texture and shape. Technical Report RJ 9203, IBM Research Division, 1993. 13. Haizhou Li, Bin Ma, and Chin-Hui Lee, A Vector Space Modeling Approach to Spoken Language Identification, in IEEE Transactions on Audio, speech, and Language processing vol. 15, No. 1, January 2007. 14. M. Basavaraju and Dr. R. Prabhakar, A Novel Method of Spam Mail Detection using Text Based Clustering Approach, in International Journal of Computer Applications (0975 8887), Vol.5 No.4, August 2010. 15. Christian Bockermann, Martin Apel, and Michael Meier, Learning SQL for Database Intrusion Detection Using Context-Sensitive Modelling (Extended Abstract), in U. Flegel and D. Bruschi (Eds.): DIMVA, LNCS 5587, pp. 196205, 2009.

You might also like