Professional Documents
Culture Documents
C. SUBRAMANIAN, 2T. BHUVANESWARI & 3S.P. RAJAGOPALAN Research scholar, Dr.MGR Educational & Research Institute University, Chennai, Tamil Nadu,India 2 Assistant Professor, Govt. Arts & Science College for Women, Bargur,Tamil Nadu, India
3
Professor Emeritus, Department of Computer Applications, Dr. MGR Educational & Research Institute University, Chennai, Tamil Nadu, India .
ABSTRACT
While there is apparently no official or standard definition for the term Very Large Database (VLDB), it is sometimes used to describe databases occupying magnetic storage in the terabyte range and containing billions of table rows. Typically, these are decision support systems or transaction processing applications serving large numbers of users. It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information. Partitioning enhances the performance, manageability, and availability of a wide variety of applications and helps reduce the total cost of ownership for storing large amounts of data.
97
Maintenance tasks become a challenge in their own right. This could be stats gathering, it could be adding columns to a table, it could be recreating global indexes, all of which now take more time than it can schedule the maintenance windows {so part of the definition of a VLDB could be down to how active a database is and how small the windows are 1TB could be a VLDB).
PARTITIONING
Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions, which are entirely transparent to an application. SQL queries and DML statements do not need to be modified in order to access partitioned tables. However, after partitions are defined, DDL statements can access and manipulate individual partitions rather than entire tables or indexes. This is how partitioning can simplify the manageability of large database objects. Each partition of a table or index must have the same logical attributes, such as column names, data types, and constraints, but each partition can have separate physical attributes such as compression enabled or disabled, physical storage settings, and tablespaces. Partitioning is useful for many different types of applications, particularly applications that manage large volumes of data. OLTP systems often benefit from improvements in manageability and availability, while data warehousing systems benefit from performance and manageability.
(called partition pruning) can provide order-of-magnitude gains in performance. It significantly reduces the impact of scheduled downtime for maintenance operations. Partition independence for partition maintenance operations lets you perform concurrent maintenance operations on different partitions of the same table or index. You can also run concurrent SELECT and DML operations against partitions that are unaffected by maintenance operations. It increases the availability of mission-critical databases if critical tables and indexes are divided into partitions to reduce the maintenance windows, recovery times, and impact of failures. Parallel execution provides specific advantages to optimize resource utilization, and minimize execution time. Parallel execution against partitioned objects is key for scalability in a clustered environment. Parallel execution is supported for queries as well as for DML and DDL. Partitioning enables faster data access within an Oracle database. Whether a database has 10 GB or 10 TB of data, partitioning can improve data access by orders of magnitude. Partitioning can be
98
implemented without requiring any modifications to your applications. For example, for converting a nonpartitioned table to a partitioned table without needing to modify any of the SELECT statements or DML statements which access that table. It is not necessary to rewrite the application code to take advantage of partitioning.
99
in VALUES LESS THAN (value_list), where column_list is an ordered list of columns that determines the partition to which a row or an index entry belongs. These columns are called the partitioning columns. The values in the partitioning columns of a particular row constitute that row's partitioning key. An ordered list of values for the columns in the column list is called a value list. Each value must be either a literal or a TO_DATE or RPAD function with constant arguments. Only
the VALUES LESS THAN clause is allowed. This clause specifies a non-inclusive upper bound for the partitions. All partitions, except the first, have an implicit low value specified by
the VALUES LESS THAN literal on the previous partition. Any binary values of the partition key equal to or higher than this literal are added to the next higher partition. Highest partition being where MAXVALUE literal is defined. Keyword, MAXVALUE, represents a virtual infinite value that sorts higher than any other value for the data type, including the null value. Hash Partitioning Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to a partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size. Hash partitioning is the ideal method for distributing data evenly across devices. Hash partitioning is also an easy-to-use alternative to range partitioning, especially when the data to be partitioned is not historical. List Partitioning List partitioning enables you to explicitly control how rows map to partitions. You do this by specifying a list of discrete values for the partitioning column in the description for each partition. This is different from range partitioning, where a range of values is associated with a partition and with hash partitioning, where you have no control of the row-to-partition mapping. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way.
100
Composite Partitioning Composite partitioning combines range and hash or list partitioning. Oracle Database first distributes data into partitions according to boundaries established by the partition ranges. Then, for range-hash partitioning, Oracle uses a hashing algorithm to further divide the data into subpartitions within each range partition. Horizontal partitioning This involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers. Vertical partitioning This involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device, for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called "row splitting" (the row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.
CONCLUSIONS
In this research article, the partition in very large database, is having different methods and criteria. Considering the factors and experiments, the range partition and hash partition gives the overall performance in information retrieval from large databases. This can be recommended for arriving the maximum requirements of the end users.
REFERENCES
1. A.P. de Vries, Mirror: Multimedia query processing in extensible databases, In Proceedings of the fourteenth Twente workshop on language technology (TWLT14): Language Technology in Multimedia Information Retrieval, pages 3748, Enschede, The Netherlands, December 1998. 2. A.P. de Vries and H.M. Blanken. Database technology and the management of multimedia data in Mirror. In Multimedia Storage and Archiving Systems III, volume 3527 of Proceedings of SPIE, pages 443455, Boston MA, November 1998. 3. A.P. de Vries and H.M. Blanken. The relationship between IR and multimedia databases. In The 20th IRSG colloquium: discovering new worlds of IR, Grenoble, France, March 1998.
101
4.
A. Hampapur and R. Jain. Multimedia data management. Using metadata to integrate and apply digital media, chapter Video data management systems: metadata and architecture, pages 245286. In Sheth and Klas [SK98], 1998.
5.
E. Remias, G. Sheikholeslami, and A. Zhang. Blockoriented image decomposition and retrieval in image database systems. In The 1996 International Workshop on Multimedia Database Management Systems, Blue Mountain Lake, New York, August 1996.
6.
I. Mani, D. House, M. Maybury, and M. Green. Intelligent multimedia information retrieval, chapter Towards contentbased browsing of broadcast news video, pages 241258. AAAI Press/MIT Press, 1997.
7.
Nita Goyal, Charles Hoch, Ravi Krishnamurthy, Brian Meckler, and Michael Suckow. Is gui programming a database research problem? In H. V. Jagadish and Inderpal Singh Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 46, 1996, pages 517528. ACM Press, 1996
8.
P.A. Boncz, S. Manegold, and M.L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of 25th International Conference on Very Large Databases (VLDB 99), Edinburgh, Scotland, UK,September 1999.
9.
S. Boll,W. Klas, and A. Sheth. Multimedia data management. Using metadata to integrate and apply digital media, chapter Overview on using metadata to manage multimedia data, pages 124. In Sheth and Klas [SK98], 1998.
10. Thomas V. Papathomas, Tiffany E. Conway, Ingemar J. Cox, Joumana Ghosn, Matt L. Miller, Thomas P. Minka, , and Peter N. Yianilos. Psychophysical studies of the performance of an image database retrieval system. In Proc. SPIE, 1998. 11. W. Greiff, W.B. Croft, and H. Turtle. PIC matrices: A computationally tractable class of probabilistic query operators. Technical Report IR132, The Center for Intelligent Information Retrieval, 1998. submitted to ACM TOIS. 12. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, and C. Faloutsos. The QBIC project: querying images by content using color, texture and shape. Technical Report RJ 9203, IBM Research Division, 1993. 13. Haizhou Li, Bin Ma, and Chin-Hui Lee, A Vector Space Modeling Approach to Spoken Language Identification, in IEEE Transactions on Audio, speech, and Language processing vol. 15, No. 1, January 2007. 14. M. Basavaraju and Dr. R. Prabhakar, A Novel Method of Spam Mail Detection using Text Based Clustering Approach, in International Journal of Computer Applications (0975 8887), Vol.5 No.4, August 2010. 15. Christian Bockermann, Martin Apel, and Michael Meier, Learning SQL for Database Intrusion Detection Using Context-Sensitive Modelling (Extended Abstract), in U. Flegel and D. Bruschi (Eds.): DIMVA, LNCS 5587, pp. 196205, 2009.