Professional Documents
Culture Documents
What is an Index?
An index is used to increase read access performance. A book, having an index, allows rapid access to a particular subject area within that book. Indexing a database table provides rapid location of specific rows within that table, where indexes are used to optimize the speed of access to rows. When indexes are not used or are not matched by SQL statements submitted to that database then a full table scan is executed. A full table scan will read all the data in a table to find a specific row or set of rows, this is extremely inefficient when there are many rows in the table.
It is often more efficient to full table scan small tables. The optimizer will often
assess full table scan on small tables as being more efficient than reading both index and data space, particularly where a range scan rather than an exact match would be used against the index. An index of columns on a table contains a one-to-one ratio of rows between index and indexed table, excluding binary key groupings, more on this later. An index is effectively a separate table to that of the data table. Tables and indexes are often referred to as data and index spaces. An index contains the indexed columns plus a ROWID value for each of those column combination rows. When an index is searched through the indexed columns rather than all the data in the row of a table is scanned. The index space ROWID is then used to access the table row directly in the data space. An index row is generally much smaller than a table row, thus more index rows are stored in the same physical space, a block. As a result less of the database is accessed when using indexes as opposed to tables to search for data. This is the reason why indexes enhance performance.
Types of indexes from numerous forms of indexes available. How does SQL behave with indexes? What should be indexed? What should not be indexed?
Most database experts recommend a maximum of three columns for composite keys.
Types of Indexes
There are different types of indexes available in different databases. These different indexes are applicable under specific circumstances, generally for specific search patterns, for instance exact matches or range matches. The simplest form of indexing is no index at all, a heap structure. A heap structure is effectively a collection of data units, rows, which is completely unordered. The most commonly used indexed structure is a Btree (Binary Tree). A Btree index is best used for exact matches and range searches. Other methods of indexing exist. 1. Hashing algorithms produce a pre-calculated best guess on general row location and are best used for exact matches. 2. ISAM or Indexed Sequential Access Method indexes are not used in Oracle. 3. Bitmaps contain maps of zeros and 1s and can be highly efficient access methods for read-only data. 4. There are other types of indexing which involve clustering of data with indexes. In general every index type other than a Btree involves overflow. When an index is required to overflow it means that the index itself cannot be changed when rows are added, changed or removed. The result is inefficiency because a search to find overflowing data involves a search through originally indexed rows plus overflowing rows. Overflow index space is normally not ordered. A Btree index can be altered by changes to data. The only exception to a Btree index coping with data changes in Oracle is deletion of rows. When rows are deleted from a table, physical space previously used by the index for the deleted row is never reclaimed unless the index is rebuilt. Rebuilding of Btree indexes is far less common than that for other types of indexes since non-Btree indexes simply overflow when row changes are applied to them. Oracle uses has the following types of indexing available.
Btree index. A Btree is a binary tree. General all-round index and common in OLTP systems. An Oracle Btree index has three layers, the first two are branch node layers and the third, the lowest, contains leaf nodes. The branch nodes contain pointers to the lower level branch or leaf node. Leaf nodes contain index column values plus a ROWID pointer to the table row. The branch and leaf nodes are optimally arranged in the tree such that each branch will contain an equal number of branch or leaf nodes. Bitmap index. Bitmap containing binary representations for each row. A zero implies that a row does not have a specified value and a 1 denotes that row having that value. Bitmaps are very susceptible to overflow in OLTP systems and should only be used for read-only data such as in Data Warehouses. Function-Based index. Contains the result of an expression pre-calculated on each row in a table. Index Organized Tables. Clusters index and data spaces together physically for a single table and orders the merged physical space in the order of the index, usually the primary key. An index organized table is a table as well as an index, the two are merged. Clusters. Partial merge of index and data spaces, ordered by an index, not necessarily the primary key. A cluster is similar to an index organized table except that it can be built on a join (more than a single table). Clusters can be ordered using binary tree structures or hashing algorithms. A cluster could also be viewed as a table as well as an index since clustering partially merges index and data spaces. Bitmap Join index. Creates a single bitmap for one table in a join. Domain index. Specific to certain application types using contextual or spatial data, amongst others.
Indexing Attributes
Various types of indexes can have specific attributes or behaviors applied to them. These behaviors are listed below, some are Oracle specific and some are not. Ascending or Descending. Indexes can be order in either way. Uniqueness. Indexes can be unique or non-unique. Primary keys must be unique since a primary key uniquely identifies a row in a table referentially. Other columns such as names sometimes have unique constraints or indexes, or both, added to them. Composites. A composite index is an index made up of more than one column in a table.
Compression. Applies to Btree indexes where duplicated prefix values are removed. Compression speeds up data retrieval but can slow down table changes. Reverse keys. Bytes for all columns in the index are reversed, retaining the order of the columns. Reverse keys can help performance in clustered server environments (Oracle8i Parallel Server / RAC Oracle9i) by ensuring that changes to similar key values will be better physically spread. Reverse key indexing can apply to rows inserted into OLTP tables using sequence integer generators, where each number is very close to the previous number. When searching for and updating rows with sequence identifiers, where rows are searched for Null values. Null values are generally not included in indexes. Sorting (NOSORT). This option is Oracle specific and does not sort an index. This assumes that data space is physically ordered in the desired manner.
What to Index
Use indexes where frequent queries are performed with where and order by clause matching the ordering of columns in those indexes. Use indexing generally on larger tables or multi-table, complex joins. Indexes are best created in the situations listed below. Columns used in joins. Columns used in where clauses. Columns used in order by clauses. In most relational databases the order-by clause is generally executed on the subset retrieved by the where clause, not the entire data space. This is not always unfortunately the case for Oracle.
Traditionally the order-by clause should never include the columns contained in the
where cause. The only case where the order-by clause will include columns contained in the where clause is the case of the where clause not matching any index in the database or
a requirement for the order by clause to override the sort order of the where, typically in highly complex, multi-table joins. The group-by clause can be enhanced by indexing when the range of values being grouped is small in relation to the number of rows in the table selected.
statement is executed the more finely it should be tuned. Additionally SQL statements executed many times more often than other SQL statements can cause issues with locking. SQL code using bind variables will execute much faster than those not. Constant re-parsing of similar SQL code can over stress CPU time resources. Tuning is not necessarily a never-ending process but can be iterative. It is always best to take small steps and then assess improvements. Small changes are always more manageable and more easier to implement. Use the Oracle performance views plus tools such as TKPROF, tracing, Oracle Enterprise Manager, Spotlight, automated scripts and other tuning tools or packages which aid in monitoring and Oracle performance tuning. Detection of bottlenecks and SQL statements causing problem is as important as resolving those issues. In general tuning falls into three categories as listed below, in order of importance and performance impact. 1. Data model tuning. 2. SQL statement tuning. 3. Physical hardware and Oracle database configuration.
Physical hardware and Oracle database configuration installation will, other than
bottleneck resolution, generally only affect performance by between 10% and 20%. Most performance issues occur from poorly developed SQL code, with little attention to SQL tuning during development, probably causing around 80% of general system performance problems. Poor data model design can cause even more serious performance problems than SQL code but it is rare because data models are usually built more carefully than SQL code. It is a common problem that SQL code tuning is often left to DBA personnel. DBA people are often trained as Unix Administrators, SQL tuning is conceptually a programming skill; programming skills of Unix Administrators are generally very lowlevel, if present at all, and very different in skills requirements to that of SQL code.
Oracle9i can automatically monitor index usage using the ALTER INDEX index
name [NO]MONITORING USAGE; command with subsequent selection of the USED column from the V$OBJECT_USAGE column. Taking an already constructed application makes alterations of any kind much more complex. Pay most attention to indexes most often utilized. Some small static tables may not require indexes at all. Small static lookup type tables can be cached but will probably be force table-scanned by the optimizer anyway; table-scans may be adversely affected
by the addition of unused superfluous indexes. Sometimes table-scans are faster than anything else. Consider the use of clustering, hashing, bitmaps and even index organized tables, only in Data Warehouses. Many installations use bitmaps in OLTP databases, this often a big mistake! If you have bitmap indexes in your OLTP database and are having performance problems, get rid of them! Oracle recommends the profligate use of function-based indexes, assuming of course there will not be too many of them. Do not allow too many programmers to create their indexes, especially not function-based indexes, because you could end-up with thousands of indexes. Application developers tend to be unaware of what other developers are doing and create indexes specific to a particular requirement where indexes may be used in only one place. Some DBA control and approval process must be maintained on the creation of new indexes. Remember, every table change requires a simultaneous update to all indexes created based on that table.
Avoid ad-hoc SQL if possible. Any functionality, not necessarily business logic, is
always better provided at the application level. Business logic, in the form of referential
integrity, is usually best catered for in Oracle using primary and foreign key constraints and explicitly created indexes. Nested subquery SQL statements can become over complicated and impossible for even the most brilliant coder to tune to peak efficiency. The reason for this complexity could lie in an over-Normalized underlying data model. In general use of subqueries is a very effective approach to SQL code performance tuning. However, the need to utilize intensive, multi-layered subquery SQL code is often a symptom of a poor data model due to requirements for highly complex SQL statement joins.
Equijoins and Column Value Transformations AND and = predicates are the most efficient. Avoid transforming of column values in any form, anywhere in a SQL statement, for instance as shown below. SELECT * FROM <table name> WHERE TO_NUMBER(BOX_NUMBER) = 94066; And the example below is really bad! Typically indexes should not be placed on descriptive fields such as names. A function-based index would be perfect in this case but would probably be unnecessary if the data model and the data values were better organized. SELECT * FROM table1, table2 WHERE UPPER(SUBSTR(table1.name,1,1)) = UPPER(SUBSTR(table2.name,1,1)); Transforming literal values is not such a problem but application of a function to a column within a table in a where clause will use a table-scan regardless of the presence of indexes. When using a function-based index the index is the result of the function which the optimizer will recognize and utilize for subsequent SQL statements. Datatypes Try not to use mixed datatypes by setting columns to appropriate types in the first place. If mixing of datatypes is essential do not assume implicit type conversion because it will not always work, and implicit type conversions can cause indexes not to be used. Function-based indexes can be used to get around type conversion problems but this is not the most appropriate use of function-based indexes. If types must be mixed, try to place type conversion onto explicit values and not columns. For instance, as shown below. WHERE zip = TO_NUMBER('94066') as opposed to WHERE TO_CHAR(zip) = '94066' The DECODE Function The DECODE function will ignore indexes completely. DECODE is very useful in certain circumstances where nested looping cursors can become extremely complex. DECODE is intended for specific requirements and is not intended to be used prolifically, especially not with respect to type conversions. Most SQL statements containing DECODE function usage can be altered to use explicit literal selection criteria perhaps using separate SELECT statements combined with UNION clauses. Also Oracle9i contains a CASE statement which is much more versatile than DECODE and may be much more efficient. Join Orders
Always use indexes where possible, this applies to all tables accessed in a join, both those in the driving and nested subqueries. Use indexes between parent and child nested subqueries in order to utilize indexes across a join. A common error is that of accessing a single row from the driving table using an index and then to access all rows from a nested subquery table where an index can be used in the nested subquery table based on the row retrieved by the driving table. Put where clause filtering before joins, especially for large tables where only a few rows are required. Try to use indexes fetching the minimum number of rows. The order in which tables are accessed in a query is very important. Generally a SQL statement is parsed from top-tobottom and from left-to-right. The further into the join or SQL statement, then the fewer rows should be accessed. Even consider constructing a SQL statement based on the largest table being the driving table even if that largest table is not the logical driver of the SQL statement. When a join is executed, each join will overlay the result of the previous part of the join, effectively each section (based on each table) is executed sequentially. In the example below table1 has the most rows and table3 has the fewest rows. SELECT * FROM table1, table2, table3 WHERE table1.index = table2.index AND table1.index = table3.index; Hints Use them? Perhaps. When circumstances force their use. Generally the optimizer will succeed where you will not. Hints allow, amongst many other things, forcing of index usage rather than full table-scans. The optimizer will generally find full scans faster with small tables and index usage faster with large tables. Therefore if row numbers and the ratios of row between tables are known then using hints will probably make performance worse. One specific situation where hints could help are generic applications where rows in specific tables can change drastically depending on the installation. However, once again the optimizer may still be more capable than any programmer or DBA. Use INSERT, UPDATE and DELETE ... RETURNING When values are produced and contained in insert or update statements, such as new sequence numbers or expression results, and those values are required in the same transaction by following SQL statements, the values can be returned into variables and used later without recalculation of expression being required. This tactic would be used in PL/SQL and anonymous procedures. Examples are shown below. INSERT INTO table1 VALUES (test_id.nextval, 'Jim Smith', '100.12', 5*10) RETURNING col1, col4 * 2 INTO val1, val2;
UPDATE table1 SET name = 'Joe Soap' WHERE col1 = :val1 AND col2 = :val2; DELETE FROM table1 RETURN value INTO :array; Triggers Simplification or complete disabling and removal of triggers is advisable. Triggers are very slow and can cause many problems, both performance related and can even cause serious locking and database errors. Triggers were originally intended for messaging and are not intended for use as rules triggered as a result of a particular event. Other databases have full-fledged rule systems aiding in the construction of Rule-Based Expert systems. Oracle triggers are more like database events than event triggered rules causing other potentially recursive events to occur. Never use triggers to validate referential integrity. Try not to use triggers at all. If you do use triggers and have performance problems, their removal and recoding into stored procedures, constraints or application code could solve a lot of your problems. Data Model Restructuring Data restructuring involves partitioning, normalization and even denormalization. Oracle recommends avoiding the use of primary and foreign keys for validation of referential integrity, and suggests validating referential integrity in application code. Application code is more prone to error since it changes much faster. Avoiding constraint-based referential is not necessarily the best solution. Referential integrity can be centrally controlled and altered in a single place in the database. Placing referential integrity in application code is less efficient due to increased network traffic and requires more code to be maintained in potentially many applications. All foreign keys must have indexes explicitly created and these indexes will often be used in general application SQL calls, other than just for validation of referential integrity. Oracle does not create internal indexes when creating foreign reference keys. In fact Oracle recommends that indexes should be explicitly created for all primary, foreign and unique hey constraints. Foreign keys not indexed using the CREATE INDEX statement can cause table locks on the table containing the foreign key. It is highly likely that foreign keys will often be used by the optimizer in SQL statement filtering where clauses, if the data model and the application are consistent with each other structurally, which should be the case. Views Do not use views as a basis for SQL statements taking a portion of the rows defined by that view. Views were originally intended for security and access privileges. No matter what where clause is applied to a view the entire view will always be executed first. On the same basis also avoid things such as SELECT * FROM , GROUP BY clauses and
aggregations such as DISTINCT. DISTINCT will always select all rows first. Do not create new entities using joined views, it is better to create those intersection view joins as entities themselves; this applies particularly in the case of many-to-many relationships. Also Data Warehouses can benefit from materialized views which are views actually containing data, refreshed by the operator at a chosen juncture. Maintenance of Current Statistics and Cost Based Optimization Maintain current statistics as often as possible, this can be automated. Cost-based optimization, using statistics, is much more efficient than rule-based optimization. Regeneration and Coalescing of Indexes Indexes subjected to constant DML update activity can become skewed and thus become less efficient over a period of time. Use of Oracle Btree indexes implies that when a value is searched for within the tree, a series of comparisons are made in order to depth-first traverse down through the tree until the appropriate value is found. Oracle Btree indexes are usually only three levels, requiring three hits on the index to find a resulting ROWID pointer to a table row. Index searches, even into very large tables, especially unique index hits, not index range scans, can be incredibly fast. In some circumstances constant updating of a binary tree can cause the tree to become more heavily loaded in some parts or skewed. Thus some parts of the tree require more intensive searching which can be largely fruitless. Indexes should sometimes be rebuilt where the binary tree is regenerated from scratch, this can be done online in Oracle9i, as shown below. ALTER INDEX index name REBUILD ONLINE; Coalescing of indexes is a more physical form of maintenance where physical space chunks which are fragmented. Index fragmentation is usually a result of massive deletions from table, at once or over time. Oracle Btree indexes do not reclaim physical space as a result of row deletions. This can cause serious performance problems as a result of fruitless searches and very large index files where much of the index space is irrelevant. The command shown below will do a lot less than rebuilding but it can help. If PCTINCREASE is not set to zero for the index then extents could vary in size greatly and not be reusable. In that the only option is rebuilding. ALTER INDEX index name REBUILD COALESCE NOLOGGING;