You are on page 1of 41

Session No. DM214 Why, When and How Should I Change the Optimizer Statistics?

Eric Miner Development Engineer II ESD Optimizer Group eric.miner@sybase.com

Objectives
Understand the statistics and how they effect the optimizer Understand why you may need/want to add or modify statistics Know when and how to add or modify the statistics

Assumptions
Know what the optimizer is - and some basics of how it works and what it does Have access to ASE 11.9.2 or above - this presentation does not deal with versions prior to 11.9.2 Know the basics of optimizer analysis Have some experience with traceon 302 and optdiag

Caveats
Adding or changing the statistics is not something you must do. It is something you should be aware of and consider doing
After full testing of course

You will need SA role to do some of this Please test changes to statistics before implementing them You may want to consider using optdiag simulate to test changes to the statistics - see Tech Doc 20472 Using Optdiag Simulate Statistics Mode Your mileage may vary

What Are The Statistics?


They describe the table, its indexes and the data in a column to the optimizer

Used by the optimizer to estimate the most efficient way to access the data required by a query They are the optimizers only view of its universe
The statistics are all it knows about your dataset

Review Of Changes To The Statistics in ASE 11.9.2 & Above


There are two types of statistics - Table/Index level and Column level

Table/Index statistics are now centralized. Stored in systabstats


Are maintained dynamically by ASE Should not be written directly - will be quickly overwritten

Distribution statistics belong to a column, not an index. Stored in sysstatistics.


Are static, need to be updated or written Can be written directly

Most statistics stored as varbinary - use optdiag to read and write them

Why Would You Want To Add or Modify Statistics?


To make composite indexes more selective - adding statistics to inner index columns - Highly recommended To put statistics on non-indexed columns - good for join costing, no statistics on a column means assumptions are made To change the number of steps (cells) in a columns histogram
More granular histogram cells - Highly recommended

To change the default selectivity values - writing statistics To change the density values - writing statistics To add statistics to a columns histogram - writing statistics

Natural Column Statistics vs. Modified Column Statistics


The natural statistics are obtained by reading the data in a column.

Update statistics gathers and writes the natural statistics


Statistics based on the data and the state of the table/index

Default statistics Statistics obtained by running update statistics on a table or index. As youve done in pre-11.9.2 versions update statistics table_A [index_1] Writes statistics for the leading column of index(es) only

Natural Statistics vs. Modified Column Level Statistics cont.


Additional Statistics

Statistics on columns other than the leading column of an index - minor index columns and non-indexed columns
Changing the number of requested steps Modified statistics -

Any statistical value you write directly


Usually done via optdiag Only column level (sysstatistics) values should be written optdiag will not allow you to write to systabstats - will be quickly overwritten

A Note on Upgrade And The Statistics


The old distribution page is used to establish new statistics - stats from upgrade are less accurate

Values of the steps are used as boundary values in the new histogram.
Weights are approximated based on the steps of the distribution page FCs are created from duplicate step values, again weights are only approximated Density values are copied over Step counts are based on the number of steps in the old page

A Note on Upgrade And The Statistics cont.


DO NOT delete statistics after upgrade
You will lose step counts

DO run update statistics as soon as possible after the upgrade


This will place the new statistics on the leading columns of your indexes - the default statistics

New Update Statistics Flavors


Use update statistics to build/update column level statistics

update statistics table_name [ind_name]


Same as always, will build/update statistics on the leading columns of indexes on the table, or on the specified index update index statistics table_name [ind_name] This will build/ update column statistics on all columns of all indexes in the table, or on the specified index. Highly recommended, but not absolutely necessary WARNING - This option can take a long time to run.

New Update Statistics Flavors cont.


update all statistics table_name
This will create or update column level statistics on all columns of the table. It will also run update partition statistics. WARNING - This can take a VERY long time to run It is rarely necessary to run update all statistics. In the majority of cases its overkill Maintenance can become overwhelming

A Word About Maintaining Statistics


The more columns with statistics the more maintenance

Maintenance considerations include The time it takes to run update statistics on each column Editing and/or reading in an optdiag file Increased use of tempdb for updating column statistics
A worktable will be used to sort all inner index columns and non-indexed columns

Proc cache usage for sorts - dependant on size of datatype

Persistent And Non-Persistent Statistics


Most column level statistics will be overwritten by update statistics

All values based on the data are overwritten Density values, and all histogram values Range and In between default selectivity values are persistent You will need to reset non-persistent values after running update statistics - Read in an optdiag file

Adding Statistics - Next Step Beyond The Default Statistics


Statistics can be added to any column

Gives the optimizer more info about a composite index (more selective). On a non-indexed column - able to cost joins on non-indexed columns Not absolutely necessary, but highly recommended
update statistics table_name (col_name)- for single column update index statistics table_name [ind_name] - for all columns of composite index(es)

Does add maintenance, effects procedure cache Test it before implementing

Adding Statistics To Composite Index Columns - Example


Statistics on the leading column only - col_A
Estimated selectivity for col_A, selectivity = 0.900288, upper limit = 0.947370.

No statistics available for col_B or col_C,


col_B uses the default in between selectivity of 0.25 col_C uses selectivity of 0.10 for equi-SARG with no statistics
Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900251,filter selectivity 0.056268 5627 rows, 6480 pages

Search argument selectivity is 0.005627

Adding Statistics To Composite Index Columns - Example


Statistics on all three columns of the index selectivity of col_C was 0.10
Estimated selectivity for col_C, selectivity = 0.062328, upper limit = 0.158700.

Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900283, filter selectivity 0.000283 28 rows, 882 pages

Search argument selectivity is 0.000028.


Table: t1 scan count 1,logical reads:(regular=885 apf=0 total=885), physical reads:(regular=0 apf=0 total=0

Adding Statistics To Non-Indexed Columns


Useful in costing joins on non-indexed columns

Without statistics on the column there is no Total density value to use in costing joins Assumptions made about how many rows will qualify Not usually accurate - based on the join operator
Estimated selectivity for col_A, selectivity = 0.100000.

Statistics on the column allow the total density value to be used to estimate the number of qualifying rows.
Estimated selectivity for col_A, selectivity = 0.000025, upper limit = 0.081425.

Changing Requested Step Count


The number of steps has an effect on the optimizer

By default new column statistics are built using 20 steps (cells) If statistics exist the same step count will be reused unless you specify a different count Increasing the step count may result in more frequency count cells - know your data May help in optimization of range SARGs because cell granularity is increased - narrower cells (steps)

Changing Requested Step Count cont.


Cells consume procedure cache when read, the larger (wider) the datatype the more memory required for each cell
The more cells read the more the effect on parse and compile time Implement increased steps only after testing. In some cases you may not need to change the step count. After an upgrade do not delete statistics or change step counts until youve tested

Changing Requested Step Count How Many Steps?


When trying to get Frequency Count Cells (FCs)

FCs represent only one value in the column - most accurate since the weight is for only one value. Range Cells represent more than one value, uniform distribution within the cell is assumed FCs can be pulled out of a RC when the value is > 50% of a cell width and there are enough requested steps available
Cell width = number of rows/(number of requested steps -1)

Changing Requested Step Count How Many Steps? Cont.


An example - FCs for each distinct value

Table has 100K rows and 35 distinct values, the lowest number of rows occupied by a value is 6 (we want a FC for this value) The number of steps to request =
((rows * .50)/(rows for the value with least rows))+1

( (100,000*.5)/6)+1 = 8335 The final histogram will have either 36 or 71 cells depending on the type of frequency count cell This is an extreme example. You may not need to have an FC for each value in the table

Changing Requested Step Count How Many Steps? Cont.


Adding steps to decrease cell width - for range SARGs (more granular cells)

Assumes Range Cells - low degree of duplicates Try doubling requested step count - then work up from there Interpolation will establish how close a range SARG value is to a boundary value of a cell and then estimate the number of rows that qualify for the SARG.

The number of steps from upgrade is a good starting point

Changing Requested Step Count How To Do It


Extensions to create index and update statistics

create index I1 on T1 (colA) using X values update statistics T1 using X values X values = requested steps, seen in optdiag Remember - cells use procedure cache when read You may not need a lot of cells Again, dont change step counts after an upgrade until youve tested create index with 0 values will create index, but will not write the statistics

Writing The Statistics Directly


Use an optdiag input file to write the statistics directly

Always get an optdiag output file before writing or changing the statistics - as insurance Useful in general if you want to go back to a previous set of statistics -o output_file_name, -i input_file_name Save a clean copy of the output file Rename and edit output file for changes to the statistics

Maintaining Directly Written Statistics


Use optdiag input files to maintain statistics changes written directly

All changes to the column level statistics, other than the default selectivity values, will be over written by update statistics Traceon 302 output will display message when edited statistics are used in costing
Statistics for this column have been edited.

If you change non-persistent values you will need set them back after updating statistics

Changing Statistics - Data Skew and The Statistics


A few values occupy many rows while many values occupy a few rows - spikes in the distribution

Data skew will have an effect on the Total Density value and thus on the costing of joins. Possible that the estimated number of rows qualifying for a join from an inner table will be pessimistic - check traceon 310 Weighted averaging used to obtain the total density Change Total Density or leave it alone?

Changing Statistics - Data Skew and The Statistics cont.


Change the Total Density to what?

Test it first!! - the Total density value will be used for all queries that join the column One approach is to set Total Density equal to Range Cell density Be careful of a 0 density value, use something higher than 0 Maybe change it to the arithmetic average Changes to density values are not persistent - update statistics will over write them

Changing Statistics - Data Skew and The Statistics cont.


Example of data skew in the histogram Range cell density: 0.0000502421670203 Total density: 0.2697381850000000 Step Weight Value 1 2 3 4 5 6 7 8 9 0.00000000 0.05263000 0.05264000 0.05265000 0.05263000 0.04125000 0.00000000 0.31933998 0.05263000 <= <= <= <= <= <= < = <= 0 5222 10564 15779 20998 24999 40000 40000 96995

Changing Statistics - Data Skew and The Statistics cont.


sp_modifystats - New system stored procedure

Will set the Total Density to equal the Range Cell Density Available in 11.9.2.2 and 12.0.1- not yet documented Caution - remember that the Total Density will be used for all joins on the column. If the Range cell density is very low you may want to modify Total density using optdiag sp_modifystats tab_name, col_name, REMOVE_SKEW_FROM_DENSITY

Changing Statistics - Unknown Equi-SARG Values


Total density is used for the selectivity in the case of an unknown equi_SARG value declare @var int, select @var = 10, where col = @var Traceon 302 output sample Selecting best index for the SEARCH CLAUSE: tabA.col1 = unknown-value

If you see an unknown value message in 302 and the default selectivity is inefficient
Try to eliminate the unknown value, if not possible

Change the Total Density value

Changing Statistics - The Default Selectivity Values


The Default selectivity values are used for the selectivity of unknown range and between SARGs

Use optdiag to change the default selectivity values


Range (<,<=,>,>=) default value is 0.33 Between (between) default value is 0.25 Range selectivity:default used In between selectivity:default Edited Range selectivity: default In between selectivity:default

(0.33) used (0.25)

used 0.0033 used 0.0025

The default selectivity values are persistent

Changing Statistics - SARG Values That Are Out Of Range


If a value is greater than the largest or less than the smallest histogram value

Traceon 302 sample Estimated selectivity for colA, selectivity = 0.000000, upper limit = 0.000000. Lower bound search value 10000 is greater than the largest value in sysstatistics for this column.

Special costing is done Not always the most accurate value:


Selectivity of 0.00 or 1.00 depending on the value and the operator

Changing Statistics - SARG Values That Are Out Of Range


Two ways to effect the histogram for this Add a dummy row to the data
Not always practical or allowed - but is persistent

Add a dummy boundary value to the histogram via optdiag


Easier to do in some cases, but not persistent

Its July 8 and update statistics hasnt been run - step 20 is last
18 19 20 0.05301946 0.05290456 0.04818739 <= <= <= "May "Jun "Jul 1 2000 12:00AM" 1 2000 12:00AM" 1 2000 12:00AM"

Change last boundary value to Aug 1

Manipulating Histogram Cells


Histogram cells can be added or changed by directly writing them

Usually done to add FCs or a dummy boundary value A few rules apply when adding cells to the histogram:
The step numbers must increase monotonically. The weight of a cell must be between 0 and 1.0. The sum of the cell weights must be close to 1.0 (0.99 to 1.01)

Always test it before implementing it!! Save a copy (optdiag output) of the original stats

Writing Statistics In Place Or Running Update Statistics


Two ways to use optdiag instead of update statistics

The dump and load method - no editing of statistics required Dump dataset and load somewhere else Run update stats on the loaded dataset Get an optdiag output file for those tables you want new stats on Load the optdiag file into the original dataset May take as much time as running update statistics on the original dataset - but no interference with users

Writing Statistics In Place Or Running Update Statistics cont.


The optdiag method - requires editing of optdiag output files Get an optdiag output file of the tables you want to update statistics on (may also do this for individual columns) Edit the files to reflect changes in your dataset - youll need to understand what changes have occurred Read the file in via optdiag Very fast, care should be taken to insure that statistics are correct Test it before implementing it

Conclusion
ASE 11.9.2 and above allows you add or write statistics

Adding and writing statistics in not absolutely necessary Adding column level statistics is highly recommended in most cases Writing statistics is recommended only when necessary

More Optimizer Resources and Help


The latest Performance and Tuning Guide
Dont be put off by the ASE 12.0 in the title, it covers the 11.9.2 features and functionality too http://sybooks.sybase.com/onlinebooks/group-as/asg1200e

Any Whats New docs for a new ASE release Tech Docs at Sybase Support
http://techinfo.sybase.com/css/techinfo.nsf/Home

More Optimizer Resources and Help cont.


The Sybase Customer newsgroups
http://support.sybase.com/newsgroups

The Sybase list server


SYBASE-L@LISTSERV.UCSB.EDU

The external Sybase FAQ


http://www.isug.com/Sybase_FAQ/

Join the ISUG (ISUG Technical Journal, feature requests and more)
http://www.isug.com

You might also like