Why Changestats

Session No. DM214 Why, When and How Should I Change the Optimizer Statistics?
Eric Miner Development Engineer II ESD Optimizer Group eric.miner@sybase.com
Objectives
Understand the statistics and how they effect the optimizer Understand why you may need/want to add or modify statistics Know when and how to add or modify the statistics
Assumptions
Know what the optimizer is - and some basics of how it works and what it does Have access to ASE 11.9.2 or above - this presentation does not deal with versions prior to 11.9.2 Know the basics of optimizer analysis Have some experience with traceon 302 and optdiag
Caveats
Adding or changing the statistics is not something you must do. It is something you should be aware of and consider doing
After full testing of course
You will need SA role to do some of this Please test changes to statistics before implementing them You may want to consider using optdiag simulate to test changes to the statistics - see Tech Doc 20472 Using Optdiag Simulate Statistics Mode Your mileage may vary
What Are The Statistics?

They describe the table, its indexes and the data in a column to the optimizer
Used by the optimizer to estimate the most efficient way to access the data required by a query They are the optimizers only view of its universe
The statistics are all it knows about your dataset
Review Of Changes To The Statistics in ASE 11.9.2 & Above

There are two types of statistics - Table/Index level and Column level
Table/Index statistics are now centralized. Stored in systabstats

Are maintained dynamically by ASE Should not be written directly - will be quickly overwritten
Distribution statistics belong to a column, not an index. Stored in sysstatistics.

Are static, need to be updated or written Can be written directly
Most statistics stored as varbinary - use optdiag to read and write them
Why Would You Want To Add or Modify Statistics?

To make composite indexes more selective - adding statistics to inner index columns - Highly recommended To put statistics on non-indexed columns - good for join costing, no statistics on a column means assumptions are made To change the number of steps (cells) in a columns histogram
More granular histogram cells - Highly recommended
To change the default selectivity values - writing statistics To change the density values - writing statistics To add statistics to a columns histogram - writing statistics
Natural Column Statistics vs. Modified Column Statistics

The natural statistics are obtained by reading the data in a column.
Update statistics gathers and writes the natural statistics

Statistics based on the data and the state of the table/index
Default statistics Statistics obtained by running update statistics on a table or index. As youve done in pre-11.9.2 versions update statistics table_A [index_1] Writes statistics for the leading column of index(es) only
Natural Statistics vs. Modified Column Level Statistics cont.

Additional Statistics
Statistics on columns other than the leading column of an index - minor index columns and non-indexed columns
Changing the number of requested steps Modified statistics -
Any statistical value you write directly

Usually done via optdiag Only column level (sysstatistics) values should be written optdiag will not allow you to write to systabstats - will be quickly overwritten
A Note on Upgrade And The Statistics

The old distribution page is used to establish new statistics - stats from upgrade are less accurate
Values of the steps are used as boundary values in the new histogram.
Weights are approximated based on the steps of the distribution page FCs are created from duplicate step values, again weights are only approximated Density values are copied over Step counts are based on the number of steps in the old page
A Note on Upgrade And The Statistics cont.

DO NOT delete statistics after upgrade
You will lose step counts
DO run update statistics as soon as possible after the upgrade

This will place the new statistics on the leading columns of your indexes - the default statistics
New Update Statistics Flavors

Use update statistics to build/update column level statistics
update statistics table_name [ind_name]

Same as always, will build/update statistics on the leading columns of indexes on the table, or on the specified index update index statistics table_name [ind_name] This will build/ update column statistics on all columns of all indexes in the table, or on the specified index. Highly recommended, but not absolutely necessary WARNING - This option can take a long time to run.
New Update Statistics Flavors cont.

update all statistics table_name
This will create or update column level statistics on all columns of the table. It will also run update partition statistics. WARNING - This can take a VERY long time to run It is rarely necessary to run update all statistics. In the majority of cases its overkill Maintenance can become overwhelming
A Word About Maintaining Statistics

The more columns with statistics the more maintenance
Maintenance considerations include The time it takes to run update statistics on each column Editing and/or reading in an optdiag file Increased use of tempdb for updating column statistics
A worktable will be used to sort all inner index columns and non-indexed columns
Proc cache usage for sorts - dependant on size of datatype
Persistent And Non-Persistent Statistics

Most column level statistics will be overwritten by update statistics
All values based on the data are overwritten Density values, and all histogram values Range and In between default selectivity values are persistent You will need to reset non-persistent values after running update statistics - Read in an optdiag file
Adding Statistics - Next Step Beyond The Default Statistics

Statistics can be added to any column
Gives the optimizer more info about a composite index (more selective). On a non-indexed column - able to cost joins on non-indexed columns Not absolutely necessary, but highly recommended
update statistics table_name (col_name)- for single column update index statistics table_name [ind_name] - for all columns of composite index(es)
Does add maintenance, effects procedure cache Test it before implementing
Adding Statistics To Composite Index Columns - Example

Statistics on the leading column only - col_A
Estimated selectivity for col_A, selectivity = 0.900288, upper limit = 0.947370.
No statistics available for col_B or col_C,

col_B uses the default in between selectivity of 0.25 col_C uses selectivity of 0.10 for equi-SARG with no statistics
Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900251,filter selectivity 0.056268 5627 rows, 6480 pages
Search argument selectivity is 0.005627
Adding Statistics To Composite Index Columns - Example

Statistics on all three columns of the index selectivity of col_C was 0.10
Estimated selectivity for col_C, selectivity = 0.062328, upper limit = 0.158700.
Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900283, filter selectivity 0.000283 28 rows, 882 pages
Search argument selectivity is 0.000028.

Table: t1 scan count 1,logical reads:(regular=885 apf=0 total=885), physical reads:(regular=0 apf=0 total=0
Adding Statistics To Non-Indexed Columns

Useful in costing joins on non-indexed columns
Without statistics on the column there is no Total density value to use in costing joins Assumptions made about how many rows will qualify Not usually accurate - based on the join operator
Estimated selectivity for col_A, selectivity = 0.100000.
Statistics on the column allow the total density value to be used to estimate the number of qualifying rows.
Estimated selectivity for col_A, selectivity = 0.000025, upper limit = 0.081425.
Changing Requested Step Count

The number of steps has an effect on the optimizer
By default new column statistics are built using 20 steps (cells) If statistics exist the same step count will be reused unless you specify a different count Increasing the step count may result in more frequency count cells - know your data May help in optimization of range SARGs because cell granularity is increased - narrower cells (steps)
Changing Requested Step Count cont.

Cells consume procedure cache when read, the larger (wider) the datatype the more memory required for each cell
The more cells read the more the effect on parse and compile time Implement increased steps only after testing. In some cases you may not need to change the step count. After an upgrade do not delete statistics or change step counts until youve tested
Changing Requested Step Count How Many Steps?

When trying to get Frequency Count Cells (FCs)
FCs represent only one value in the column - most accurate since the weight is for only one value. Range Cells represent more than one value, uniform distribution within the cell is assumed FCs can be pulled out of a RC when the value is > 50% of a cell width and there are enough requested steps available
Cell width = number of rows/(number of requested steps -1)
Changing Requested Step Count How Many Steps? Cont.

An example - FCs for each distinct value
Table has 100K rows and 35 distinct values, the lowest number of rows occupied by a value is 6 (we want a FC for this value) The number of steps to request =
((rows * .50)/(rows for the value with least rows))+1
( (100,000*.5)/6)+1 = 8335 The final histogram will have either 36 or 71 cells depending on the type of frequency count cell This is an extreme example. You may not need to have an FC for each value in the table
Changing Requested Step Count How Many Steps? Cont.

Adding steps to decrease cell width - for range SARGs (more granular cells)
Assumes Range Cells - low degree of duplicates Try doubling requested step count - then work up from there Interpolation will establish how close a range SARG value is to a boundary value of a cell and then estimate the number of rows that qualify for the SARG.
The number of steps from upgrade is a good starting point
Changing Requested Step Count How To Do It

Extensions to create index and update statistics
create index I1 on T1 (colA) using X values update statistics T1 using X values X values = requested steps, seen in optdiag Remember - cells use procedure cache when read You may not need a lot of cells Again, dont change step counts after an upgrade until youve tested create index with 0 values will create index, but will not write the statistics
Writing The Statistics Directly

Use an optdiag input file to write the statistics directly
Always get an optdiag output file before writing or changing the statistics - as insurance Useful in general if you want to go back to a previous set of statistics -o output_file_name, -i input_file_name Save a clean copy of the output file Rename and edit output file for changes to the statistics
Maintaining Directly Written Statistics

Use optdiag input files to maintain statistics changes written directly
All changes to the column level statistics, other than the default selectivity values, will be over written by update statistics Traceon 302 output will display message when edited statistics are used in costing
Statistics for this column have been edited.
If you change non-persistent values you will need set them back after updating statistics
Changing Statistics - Data Skew and The Statistics

A few values occupy many rows while many values occupy a few rows - spikes in the distribution
Data skew will have an effect on the Total Density value and thus on the costing of joins. Possible that the estimated number of rows qualifying for a join from an inner table will be pessimistic - check traceon 310 Weighted averaging used to obtain the total density Change Total Density or leave it alone?
Changing Statistics - Data Skew and The Statistics cont.

Change the Total Density to what?
Test it first!! - the Total density value will be used for all queries that join the column One approach is to set Total Density equal to Range Cell density Be careful of a 0 density value, use something higher than 0 Maybe change it to the arithmetic average Changes to density values are not persistent - update statistics will over write them

Example of data skew in the histogram Range cell density: 0.0000502421670203 Total density: 0.2697381850000000 Step Weight Value 1 2 3 4 5 6 7 8 9 0.00000000 0.05263000 0.05264000 0.05265000 0.05263000 0.04125000 0.00000000 0.31933998 0.05263000 <= <= <= <= <= <= < = <= 0 5222 10564 15779 20998 24999 40000 40000 96995

sp_modifystats - New system stored procedure
Will set the Total Density to equal the Range Cell Density Available in 11.9.2.2 and 12.0.1- not yet documented Caution - remember that the Total Density will be used for all joins on the column. If the Range cell density is very low you may want to modify Total density using optdiag sp_modifystats tab_name, col_name, REMOVE_SKEW_FROM_DENSITY
Changing Statistics - Unknown Equi-SARG Values

Total density is used for the selectivity in the case of an unknown equi_SARG value declare @var int, select @var = 10, where col = @var Traceon 302 output sample Selecting best index for the SEARCH CLAUSE: tabA.col1 = unknown-value
If you see an unknown value message in 302 and the default selectivity is inefficient
Try to eliminate the unknown value, if not possible
Change the Total Density value
Changing Statistics - The Default Selectivity Values

The Default selectivity values are used for the selectivity of unknown range and between SARGs
Use optdiag to change the default selectivity values

Range (<,<=,>,>=) default value is 0.33 Between (between) default value is 0.25 Range selectivity:default used In between selectivity:default Edited Range selectivity: default In between selectivity:default
(0.33) used (0.25)
used 0.0033 used 0.0025
The default selectivity values are persistent
Changing Statistics - SARG Values That Are Out Of Range

If a value is greater than the largest or less than the smallest histogram value
Traceon 302 sample Estimated selectivity for colA, selectivity = 0.000000, upper limit = 0.000000. Lower bound search value 10000 is greater than the largest value in sysstatistics for this column.
Special costing is done Not always the most accurate value:

Selectivity of 0.00 or 1.00 depending on the value and the operator
Changing Statistics - SARG Values That Are Out Of Range

Two ways to effect the histogram for this Add a dummy row to the data
Not always practical or allowed - but is persistent
Add a dummy boundary value to the histogram via optdiag

Easier to do in some cases, but not persistent
Its July 8 and update statistics hasnt been run - step 20 is last
18 19 20 0.05301946 0.05290456 0.04818739 <= <= <= "May "Jun "Jul 1 2000 12:00AM" 1 2000 12:00AM" 1 2000 12:00AM"
Change last boundary value to Aug 1
Manipulating Histogram Cells

Histogram cells can be added or changed by directly writing them
Usually done to add FCs or a dummy boundary value A few rules apply when adding cells to the histogram:
The step numbers must increase monotonically. The weight of a cell must be between 0 and 1.0. The sum of the cell weights must be close to 1.0 (0.99 to 1.01)
Always test it before implementing it!! Save a copy (optdiag output) of the original stats
Writing Statistics In Place Or Running Update Statistics

Two ways to use optdiag instead of update statistics
The dump and load method - no editing of statistics required Dump dataset and load somewhere else Run update stats on the loaded dataset Get an optdiag output file for those tables you want new stats on Load the optdiag file into the original dataset May take as much time as running update statistics on the original dataset - but no interference with users
Writing Statistics In Place Or Running Update Statistics cont.

The optdiag method - requires editing of optdiag output files Get an optdiag output file of the tables you want to update statistics on (may also do this for individual columns) Edit the files to reflect changes in your dataset - youll need to understand what changes have occurred Read the file in via optdiag Very fast, care should be taken to insure that statistics are correct Test it before implementing it
Conclusion
ASE 11.9.2 and above allows you add or write statistics
Adding and writing statistics in not absolutely necessary Adding column level statistics is highly recommended in most cases Writing statistics is recommended only when necessary
More Optimizer Resources and Help

The latest Performance and Tuning Guide
Dont be put off by the ASE 12.0 in the title, it covers the 11.9.2 features and functionality too http://sybooks.sybase.com/onlinebooks/group-as/asg1200e
Any Whats New docs for a new ASE release Tech Docs at Sybase Support
http://techinfo.sybase.com/css/techinfo.nsf/Home
More Optimizer Resources and Help cont.

The Sybase Customer newsgroups
http://support.sybase.com/newsgroups
The Sybase list server

SYBASE-L@LISTSERV.UCSB.EDU
The external Sybase FAQ

http://www.isug.com/Sybase_FAQ/
Join the ISUG (ISUG Technical Journal, feature requests and more)
http://www.isug.com

Why Changestats

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Why Changestats

Uploaded by

Copyright:

Available Formats

Session No. DM214 Why, When and How Should I Change the Optimizer Statistics?

Eric Miner Development Engineer II ESD Optimizer Group eric.miner@sybase.com

What Are The Statistics?

Review Of Changes To The Statistics in ASE 11.9.2 & Above

Table/Index statistics are now centralized. Stored in systabstats

Distribution statistics belong to a column, not an index. Stored in sysstatistics.

Why Would You Want To Add or Modify Statistics?

Natural Column Statistics vs. Modified Column Statistics

Update statistics gathers and writes the natural statistics

Natural Statistics vs. Modified Column Level Statistics cont.

Any statistical value you write directly

A Note on Upgrade And The Statistics

A Note on Upgrade And The Statistics cont.

DO run update statistics as soon as possible after the upgrade

New Update Statistics Flavors

update statistics table_name [ind_name]

New Update Statistics Flavors cont.

A Word About Maintaining Statistics

Proc cache usage for sorts - dependant on size of datatype

Persistent And Non-Persistent Statistics

Adding Statistics - Next Step Beyond The Default Statistics

Does add maintenance, effects procedure cache Test it before implementing

Adding Statistics To Composite Index Columns - Example

No statistics available for col_B or col_C,

Search argument selectivity is 0.005627

Adding Statistics To Composite Index Columns - Example

Search argument selectivity is 0.000028.

Adding Statistics To Non-Indexed Columns

Changing Requested Step Count

Changing Requested Step Count cont.

Changing Requested Step Count How Many Steps?

Changing Requested Step Count How Many Steps? Cont.

Changing Requested Step Count How Many Steps? Cont.

The number of steps from upgrade is a good starting point

Changing Requested Step Count How To Do It

Writing The Statistics Directly

Maintaining Directly Written Statistics

Changing Statistics - Data Skew and The Statistics

Changing Statistics - Data Skew and The Statistics cont.

Changing Statistics - Data Skew and The Statistics cont.

Changing Statistics - Data Skew and The Statistics cont.

Changing Statistics - Unknown Equi-SARG Values

Change the Total Density value

Changing Statistics - The Default Selectivity Values

Use optdiag to change the default selectivity values

(0.33) used (0.25)

used 0.0033 used 0.0025

The default selectivity values are persistent

Changing Statistics - SARG Values That Are Out Of Range

Special costing is done Not always the most accurate value:

Changing Statistics - SARG Values That Are Out Of Range

Add a dummy boundary value to the histogram via optdiag

Change last boundary value to Aug 1

Manipulating Histogram Cells

Writing Statistics In Place Or Running Update Statistics

Writing Statistics In Place Or Running Update Statistics cont.

More Optimizer Resources and Help

More Optimizer Resources and Help cont.

The Sybase list server

The external Sybase FAQ

You might also like