You are on page 1of 36

A

technical

Journal

for

the

PASS

Community

INTRODUCTION TO PERFORMANCE TUNING ON SQL SERVER 6 USING T-SQL CODEGEN FOR AUDIT TRAILS 14 PERFORMANCE TUNING USING SQL PROFILER 18 UNDERSTANDING QUERY EXECUTION PLANS TO OPTIMIZE PERFORMANCE 22 SQL CLR - HOW, WHY AND WHY NOT 27 DBA 101 - PERFORMANCE TUNING 101 33

Visit us on the World Wide Web at www.sqlserverstandard.com

MAY/JUNE 2007 ISSUE

THE PROFESSIONAL ASSOCIATION FOR SQL SERVER (PASS) PROVIDES A WIDE ARRAY OF YEAR-ROUND BENEFITS
JOIN A THRIVING COMMUNITY WITH MORE THAN 11,000 MEMBERS WORLDWIDE!
As a member of PASS, you receive a number of benefits that support your SQL Server user needs. Whether youre looking for substantial savings on SQL Server-related products, services to support current business initiatives, or educational opportunities, PASS membership benefits you on a number of SQL Server-focused fronts.

Educational Value
Discount to the 2007 PASS Community Summit: PASS Premium members receive a $200 USD discount to the LARGEST event of the year dedicated exclusively to SQL Server education and training the 2007 PASS Community Summit, September 18-21 in Denver, Colorado. Register early to save! Subscription to SQL Server Standard Magazine: With articles that appeal to developers, DBAs and Business Intelligence professionals, PASS members have access to the information and tools to help them develop their careers. Members receive 6 issues per year. International members are able to access the most current editions online.

Networking Value
Chapters (Regional User Groups): PASS provides a network of 100 official chapters/affiliated groups worldwide that offer valuable education and networking opportunities on a local level. For more information on finding a chapter in your area or starting a chapter please contact pass_chapters@sqlpass.org. Special Interest Groups (SIGs): PASS members have the option to join a variety of Special Interest Groups (SIGs) including DBA, AppDev and BI. SIGs connect PASS members from around the globe who have similar interests and face similar challenges. Visit http://sigs.sqlpass.org

Online Value
Access to Online Conference Proceedings: Only PASS members have access to the extensive source of SQL Server information from MVPs, Microsoft developers and user-experts who have presented at previous PASS user events. Job Target: This new online career resource helps connect employers looking for qualified employees and professionals looking for a new opportunity.

PASS offers two levels of membership Premier ($150/year) and Basic (FREE online membership). For more information or to join, visit http://www.sqlpass.org/membership/ Check out the new PASS SIG Web site today! Visit http://sigs.sqlpass.org
Register for the SIG Web (free of charge) and gain access to all of these great tools Book Reviews to keep you up-to-speed on useful industry resources SQL Server articles and interviews with influential industry leaders Blogs allowing you to voice your opinions and share information Educational Webcasts And much more! How can you get involved? Submit an article, script, link, write a book review or volunteer. There are many ways to get involved! Register on the SIG Web site and find out how!

PERFORMANCE TUNING
When it comes to working with databases, performance tuning is one of those things that everyone eventually needs to think about. My personal opinion is that Id rather think about it well before my systems go into production. I make sure that I am thinking about performance when I am designing my data structures, when I am writing my queries, and as I am going through quality assurance. Even with keeping all of this in perspective, there are always things that are missed. While the structures and queries that are designed for applications might be optimal when the system rolls out, time changes the performance curve of a system. Unexpected usage patterns, increased data volumes and hardware restrictions all come into play in a production system. How often do you intentionally induce index fragmentation into a test environment to see how your application is going to perform? If you dont, try it some time. It can be a real eye-opener. If your primary focus is that of a database developer, you need to make sure that you work with your production staff to ensure that they have the knowledge they need to support your systems in a production environment. If, on the other hand, your primary focus is that of a production database administrator, make sure that you engage your developers early in the course of a project so that your physical environment can be factored in when the system is designed. Communication is the key here. Tuning databases is as much an art as it is a science. If you have the luxury of a test environment that closely matches your production environment from a specification standpoint, my suggestion would be to try out a few different ways of doing things. Indexes usually give you the most bang for the tuning buck, but they do come with a price additional overhead when writing data. Rewriting queries can completely change the way that things perform. If you have a slow query, write it a few different ways and compare the performance. These are just a few ideas to get you started. I hope that some of the ideas put forth in this issue will help you walk a little further down the database tuning road. If you have any comments, please send them to me at editorial@sqlpass.org. Happy tuning!

Editor In Chief: Chuck Heinzelman Managing Editor: Susan Page Copy Editor: Susan Page Tech Editors: Darren Lacy Kathi Kellenberger Frank Scafidi Adam Machanic Graphic Design: Erin Agresta Printing: NBS-NorthBrook Services Advertising: Lesley MacDonald (lesley@ccevent.com) Subscriptions and address changes: Wayne Snyder (wayne.snyder@sqlpass.org) Feedback: Chuck Heinzelman (chuck.heinzelman@sqlpass.org) Copyright: Unless otherwise noted, all programming code and articles in this issue are the exclusive copyright of The Professional Association for SQL Server (PASS). Permission to photocopy for internal or personal use is granted to the purchaser of the magazine. SQL Server Standard is an independent publication and is not affiliated with Microsoft Corporation. Microsoft Corporation is not responsible in any way for the editorial policy or other contents of this publication. SQL Server, ADO.NET, Windows, Windows NT, Windows 2000 and Visual Studio are registered trademarks of Microsoft Corporation. Rather than put a trademark symbol in each occurrence of other trademarked name, we state that we are using the names only in an editorial fashion with no intention of infringement of the trademark. Although all reasonable attempts are made to ensure accuracy, the publisher does not assume any liability for errors or omissions anywhere in this publication. It is the readers responsibility to ensure that the procedures are acceptable in the readers environment and that proper backup is created before implementing any procedures.

Chuck Heinzelman
PASS Director of Technical Publications If you are interested in writing an article for the SQL Server Standard, please contact me at editorial@sqlpass.org. Ill get you a copy of the editorial calendar which includes the editorial focus for each of the next few issues and the deadlines for article submissions.

PASS Summit Pre-Conference Sessions 2007


Monday, September 17, 2007
Inside T-SQL Querying, Programming and Tuning - Putting your knowledge into Action Presented by Itzik Ben-Gan From the author of the bestselling books Inside Microsoft SQL Server 2005: T-SQL Querying and Inside Microsoft SQL Server 2005: T-SQL Programming, this seminar is jam-packed with practical advice for T-SQL querying, programming and tuning. The seminar covers practical problems T-SQL programmers face daily, providing different solutions for each problem, and explains in detail how to tune your code to produce robust and efficient applications. Microsoft AS Client Tools: Choosing and Using Presented by Reed Jacobson Do all the new client-side tools now available for Analysis Services have you excited? Or perhaps a little confused? Just a couple of years ago, there was a dearth of clientside tools, but with the ProClarity acquisition and the release of the Office 2007 System, there are now tools that seem to overlap as much as complement each other. This intense pre-conference session will give you the tools you need to make the right implementation choices. Working through all the major Microsoft client tools, you'll learn: Critical implementation skills for each tool Where each tool is the strongest and weakest Where the tools complement each other and where they overlap. Speaker Biography: Itzik Ben-Gan is a Mentor and Founder of Solid Quality Learning. A SQL Server Microsoft MVP since 1999, Itzik has delivered numerous training, mentoring and consulting events around the world focused on T-SQL Querying, Query Tuning and Programming. Itzik is the author of Inside Microsoft SQL Server 2005: T-SQL Querying (MSPress), Inside Microsoft SQL Server 2005: T-SQL Programming (MSPress), and a co-author of Advanced Transact-SQL for SQL Server 2000 (APress). He has written many articles for SQL Server Magazine as well as articles and whitepapers for MSDN. Itzik is the founder and manger of the Israeli SQL Server User Group. The session will cover Excel 2007 client, SharePoint 2007 Excel Server, ProClarity desktop, ProClarity Enterprise Server, Business Scorecard Manager, Reporting Services, and the Office 2003 Web Components. This is not a marketing session. Its a down and dirty look at both the good and the bad so that you can make the best decisions providing client tools to your users. Speaker Biography: Reed Jacobson has been working with Business Intelligence for over twenty years. Last year, he developed the pre-release course on SQL Server 2005 BI for Microsoft, and delivered it to partners around the world. He has written several books for Microsoft Press. Additional Pre-Conference Seminars for September 17 and September 18, 2007 to be announced. Speaker Biography: Kalen Delaney has worked with SQL Server for 19 years, starting with employment with the Sybase Corporation in 1987. She worked at Sybase in technical support and training until 1992, when she became an independent trainer and consultant. In 2002, Kalen and 5 colleagues formed Solid Quality Learning, a partnership committed to offering the most advanced SQL Server training in the world. In addition to teaching and consulting, Kalen has monthly column in SQL Server Magazine, and has written or contributed to several advanced SQL Server books. Her next book Inside SQL Server 2005: The Storage Engine was published in 2006. Speaker Biography: Andrew Kelly is a Mentor with Solid Quality Learning based out of Plaistow, NH. He has over 15 years experience with relational databases and application development on both Unix and Windows platforms. He is a SQL Server MVP and currently does high level training and SQL Server consulting with many different clients throughout the world.

Tuesday, September 18, 2007


Controlling and Reusing Query Plans Presented by Kalen Delaney This session will cover techniques available for determining if, when and how you should override the optimizers choice of query plans. It will also cover how to determine if your plans are being reused, how you can determine if reuse is desirable for your queries, and how you can control query caching behavior.

Database Maintenance for SQL Server 2005 Presented by Andrew Kelly This session is intended to walk the attendees through all of the different aspects of database maintenance for SQL Server 2005. It will cover not only the types of tasks but solid examples of code and techniques used to do so. This will include but not be limited to integrity checks, reindexing, backups, clean up, monitoring, updating stats, error and log checking.

TABLE OF CONTENTS
INTRODUCTION TO PERFORMANCE TUNING ON SQL SERVER . . . . . . . . . . . . . . . . . . . . . . . . . . .6
If you are new to performance tuning, it can seem quite a daunting task. This article will give you a good idea of where to look to determine the root cause of your performance problems, as well as some methods for solving them. By Wayne Fillis

USING T-SQL CODEGEN FOR AUDIT TRAILS . . . . .14


Have you ever needed to do complex audit trails on your data? If you have, then you know how involved maintaining audits can be especially if you have a table structure that hasnt been finalized. In this article, you will learn how to automate the generation of auditing code. By Paul Nielsen

PERFORMANCE TUNING USING SQL PROFILER 18


One of the first things you need to know when tuning database performance is what is causing the performance problem. SQL Server Profiler is a powerful tool that can help you gather that information, and it is included with SQL Server right out of the box. By K. Brian Kelley

UNDERSTANDING QUERY EXECUTION PLANS TO OPTIMIZE PERFORMANCE . . . . . . . . . . . . . . . .22


When tuning a query, the plan that is used to execute the query can be extremely useful and extremely overwhelming to the untrained eye. In this article, you will learn about the different ways to view a query plan as well as how to interpret the results once you have them. By Scott Klein

SQL CLR - HOW, WHY AND WHY NOT . . . . . . . . .27


With the release of SQL Server 2005, Microsoft introduced the .NET Common Language Runtime into the database engine. This functionality can be extremely beneficial, but it has the potential to be misused. In this article, well not only talk about how to use the CLR within SQL Server, but when and when not to use it. By Chuck Heinzelman

DBA 101 - PERFORMANCE TUNING 101 . . . . . . .33


Welcome to a new feature column for the SQL Server Standard. The goal of the DBA101 column is to put an entry-level perspective on the editorial focus of the issue. In this issue, we will take an ultra-high-level look at the core concepts behind performance tuning.

INTRODUCTION TO PERFORMANCE TUNING ON SQL SERVER


BY: Wayne Fillis

Introduction
Performance Tuning in SQL Server is an art, and so involved that some DBAs specialize in this field. Every system developed has scope for improvement, and it is usually the DBAs job to fix the inevitable performance problems that develop over time. I cant say that I am a SQL Server guru, or even an expert, but I hope to share with you some of the things I have learned over the past 7 years that I have been involved with SQL Server. Most of what I have to say focuses on SQL Server 2000, but it is also relevant for SQL Server 2005. Arguably, the 2 main areas of performance tuning that a DBA would focus on are: 1. Hardware tuning (the Big Three) 2. Query tuning Lets start with The Big Three Memory, IO and CPU.

dures. Memory speeds up data access, because if SQL Server can find the requested data in memory it does not need to generate an expensive disk access to retrieve the data. Memory is also used for a variety of other processes. 32- bit hardware Each Windows application running on 32-bit architecture is only able to access 4 GB of RAM. Two GB is reserved for the Operating System (OS), and 2 GB is used by the application, which in our case is SQL Server. By using the /3G switch in the boot.ini file, you can change this behavior and reserve 3 GB for SQL Server and 1 GB for the OS. AWE SQL Server 2000 Enterprise and Developer Edition is AWE enabled. AWE stands for Address Windowing Extensions, and is available on Windows 2000/2003 Advanced Server and Datacenter Server. By enabling AWE on the OS (using the /PAE switch in boot.ini), and in SQL Server (in the configuration options), you will be able to access 8 GB of RAM with Windows 2000 Advanced Server (32 GB with Windows 2003 Advanced Server) and 64 GB with Windows 2000 Datacenter Server. For more information, see the article Enabling AWE Memory for SQL Server at
http://msdn2.microsoft.com/en-us/library/ms190673.aspx.

The Big Three (Hardware Tuning)


When an application performs poorly, you can usually see an impact on Memory, IO or CPU. Sometimes this is a symptom of a problem, such as poorly performing queries, but occasionally you have a real hardware problem. The trick is to identify if your hardware is the symptom or the cause, and this ability comes with experience.

Memory (RAM)
Memory Allocation SQL Server allocates memory on an as-needed basis, but will basically take as much memory as it can. Memory is primarily used for the data buffer cache, as well as a cache for compiled queries and stored proce-

You also need to bear in mind that there is a performance overhead to using AWE, as the memory addresses need to be mapped to use the higher range of RAM. Also, only the relational engine data cache can use AWE memory. This can be avoided completely by using 64-bit hardware.
Arguably, the 2 main areas of performance tuning that a DBA would focus on are hardware tuning and query tuning.

SQL SERVER STANDARD MAY JUNE 2007

64-bit hardware 64-bit Itanium 2 hardware running Windows Server 2003 can directly address up to 1,024 GB of physical memory, and 512 GB of addressable memory. SQL Server 2000 (64-bit edition) can access up to 512 GB. Furthermore, all parts of SQL Server can use this memory. A Microsoft article entitled SQL Server 2000 Enterprise Edition (64bit): Advantages of a 64-Bit Environment can help you make a choice between 32-bit and 64-bit platforms. Note that in the past few years, SQL Server 2005s increased compatibility with newer non-Itanium 64-bit platforms has opened up huge possibilities for scaling out your systems. Watch this emerging market carefully; it could take your applications to new levels. Performance Monitor You can use Performance Monitor (perfmon) from Control Panel / Administrative Tools to monitor the Buffer Cache Hit Ratio counter (under Buffer Manager), to see if data pages are being flushed from memory because there is simply not enough memory to store the data pages for a long enough period. If you monitor perfmon counter Statistics / Batch Requests/sec, SQL Compilations/sec, and SQL Re-Compilations/sec, you can see if your stored procedures are being flushed from the procedure cache too quickly due to memory constraints. You can also look for recompilations of stored procedures in the system table syscacheobjects. Adding more memory Adding more memory to your server can often solve many performance problems, but try to first improve your worst queries and any application design issues. Treat the cause, not the symptom.

Tricks to minimize impact However, there are some tricks you can use to minimize the hit on your disk subsystem. A company I used to work at had recurrent disk problems on the same day of every week the day reporting processes were run. The first thing we did was to run perfmon and monitor the counters called % Idle Time and Avg. Disk Queue Length (under Physical Disk). You will always see some queue length (this indicates that the disks cannot handle requests fast enough), but if the idle time or queue length is worse than normal then you can assume your disks are being thrashed. This is not good, as it will have a ripple effect throughout your applications. You may even notice your CPU dropping as SQL Server is throttled back by the slow disks. At this point, I think it is a good time to note that you should ideally run perfmon on your live servers frequently, so you can see what statistics you get under normal load. Then, when the system is behaving poorly, you will recognize anomalies more clearly. Idle Time At this company we noticed the idle time was consistently 100% on the data disks. We had already moved the transaction logs onto separate drives as a performance enhancement, but the disk being hammered was the main disk where our data files were being stored. A DBA noted that the tempdb database was also stored on these same disks, and the reporting processes made heavy use of sorting; hence, they used tempdb. We realized that tempdb was growing steadily in size. A quick check revealed that, to our horror, the auto growth size for our tempdb log file was set to 1 MB. It is an expensive IO process to grow a log or data file, and this was causing our idle time to hit the roof. We went through all our data and log files, and ensured the autogrow was set to 20%. If you have inherited a system, you might assume that the environment is configured correctly; but be warned it is not always so. Tempdb optimization If your applications make heavy use of tempdb, you can optimize by moving the database onto its own mirrored drives, and by adding data and log files so they equal the number of logical processors (CPUs). SQL Server will spread the load across the data files, thus reducing overhead when creating new objects. I have been told you can also get around the bottleneck of

IO
Upgrading your disks Disk usage, or IO, problems are possibly the easiest to diagnose and at the same time the most difficult part of your system to remedy. If you are not using a disk array or a SAN (Storage Area Network), adding or upgrading your physical disks can be a time consuming task, involving considerable application down-time. Even with a disk array or SAN, down-time is often inevitable.

SQL SERVER STANDARD MAY JUNE 2007

creating new objects by allocating a tempdb data file of greater than 4 GB, though I have not tested this. Be warned, though: there is a bug in SQL Server 2000 when you have multiple tempdb data files. Whenever SQL Server is restarted the autogrow is set to 1MB. You can remedy this with a SQL Agent job which runs when SQL Server starts, and resets the auto grow to 20%. The problem is with scans Queries which do Index Scans, Table Scans and Clustered Index Scans on large tables or indexes also use a lot of IO and memory (the data read needs to be stored in memory). Occasionally a scan is the best technique for reading data, but a new index can often remove the need for the scan. If your query scans a very large table, you are performing a large amount of potentially unnecessary IO. You can monitor the level of scans on your system by using perfmon counter Access Methods / Full Scans/sec, and by using SQL Profiler events Scan:Started and Scan:Stopped. I will cover this in more detail later in the article.

Query Performance Tuning Guidelines Indexes


Indexes are crucial Table Indexes are crucial to query performance. An index is a structure similar to the inverted branches of a tree, and is used to quickly access data. They are like the index of a book instead of reading the book from cover to cover to locate one sentence, simply use the index. Reading the book from cover to cover is called a Scan, and it is a very time consuming process. If SQL Server does not have an index to use, it will perform a scan of the table. By adding appropriate indexes to your tables you can dramatically reduce the cost and execution time of your queries. There are a few tools that can assist you in choosing your indexes, and we will cover this later in the article. Index Fragmentation Clustered Index fragmentation is also an important consideration in query performance. A Clustered Index is an index that determines the physical sort order of the table data. As you insert, delete and update rows in the table, the physical sort order on disk gets out of sync; SQL Server maintains logical sort order with the use of pointers. Running DBCC SHOWCONTIG on the table will display the level of fragmentation, and to identify the levels of fragmentation look at Scan Density (should be close to 100%), Logical Scan and Extent Scan Fragmentation. I recommend running this after hours, as SHOWCONTIG takes a shared lock on the table. If you have high fragmentation levels, consider running DBCC DBREINDEX or DBCC INDEXDEFRAG to remove the fragmentation. Again, this should ideally be run after hours as they are intensive operations. Another step to take is to set up weekly SQL Server Maintenance Plans to optimize your indexes. Note that you can only reorganize a table which has a clustered index, though you should ensure all of your tables have one of these. Fill Factor Fill Factor is set independently for each index, and indicates the amount of space to use in your data pages,and how much to keep free for future use. Each data page is 8K in size, and you want to ideally squeeze as many rows as possible on a page in order to get as

CPU
CPU intensive SQL Server is also CPU intensive, though this largely depends on the nature of your code and the access method chosen by SQL Server. Compiles One thing I have noticed that uses a significant amount of CPU is compiles. Whenever a stored procedure is run for the first time the code is compiled and the access method (for example, what indexes to use) is determined. This is a very CPU-intensive process. I once saw a stored procedure take 18 seconds to compile, and 1 second to run. SQL Server will cache compiled plans and reuse them. Using sp_executesql as a parameterized query can often result in the plan being reused, but SQL code embedded in your application will almost always compile every time it runs. Monitoring perfmon counter Statistics / Batch Requests/sec, SQL Compilations/sec, and SQL Re-Compilations/sec will show if you have an excessive number of compiles taking place. Try converting embedded SQL to stored procedures, and you will find your number of compiles will drop.

SQL SERVER STANDARD MAY JUNE 2007

many rows as possible for each page read. This reduces your IO. Fill Factor is used to keep free space available on the data pages, which is used by inserts and updates. If there is no free space on a data page and you need to insert a new row, then the page will split to make room for the new data. This is an IO intensive operation and causes fragmentation. The balance between low fill factor (more free space) and high fill factor is a balance between performance for reading, and performance for inserts and updates. One company I used to work at got the fill factor incorrect in their weekly Maintenance Plan to defragment the databases indexes. Instead of a fill factor of 90% (this means 10% free), the fill factor was accidentally set to 10% (90% free). The next morning the system was performing poorly. In fact, the users could not perform simple transactions against the database. A DBA noticed that the size of the database had grown much larger, and this was due to the increased amount of free space on the data pages. The number of IOs against the database had significantly increased, as multiple IOs were needed to retrieve the same data one IO would have taken before the defragmentation took place. I recommend the use of fill factor, but try to keep it around 8090%. Alternatively, leave it set at zero for SQL Server to maintain fill factor itself. Covering Index When SQL Server processes a query it uses the index to find the data page, then reads the page to access the remainder of the columns that the query needs. A Covering Index is used to radically improve performance of a query by including all these extra columns in the index itself. The result is a bloated index that can at times be almost the same size of the table itself. There will be an overhead for deletes, updates and inserts to the table, but the select query will usually be improved substantially. When adding covering indexes, you will need to make a judgment call regarding performance between updates vs. reads.

your query selection criteria includes a high percentage of the rows in the table (for example, a search on Gender = Male). A Scan can often generate a high volume of IO, though this depends on the size of the table or index being scanned. You can identify when scans are running by monitoring the Scan:Started and Scan:Stopped events in SQL Profiler, and by using the perfmon counter SQL Server:Access Method / Full Scans/sec. You can see the impact of a heavy scan by looking at perfmon counter Physical Disk / Avg. Disk Queue Length and % Idle Time. If the Queue Length increases, and the % Idle Time drops significantly at the same time that Full Scan/sec increases, then the scan is likely to be causing an IO problem. The scan could also be flushing data out of your data cache (which resides in memory), and causing a subsequent memory overload. The trick is to identify which queries are causing the scan. To do this, run a SQL Profiler trace at the same time you run perfmon, and try to limit the trace by filtering on CPU or Duration (in milliseconds) and Reads to pick up the expensive queries. The query that is running at the exact time you see a significant dip in % Idle Time is potentially causing IO problems.

Sometimes when you have identified a performance problem, you just need more powerful hardware.

Temporary Tables and Table Variables


I often use a Table Variable, or Temporary Table, to preprepare data in stored procedures. If you need to prepare or format data in inner sub-queries or correlated sub-queries, it is sometimes better to create a table variable or temporary table and insert the data into the table before your main query runs. The advantage of this process is that you can index the temporary table or table variable for better performance. Try solutions with both a table variable and a temporary table and compare the results. Be careful, though, because the Estimated Execution Plan subtree cost (explained later in this article) does not always show the impact a table variable will have in the query check your actual runtime. There is some confusion about the use of table variables vs. temporary tables, but the agreed recommendation is to test both scenarios. Temp tables usually give better performance for large amounts of data.

Avoiding Scans
As mentioned previously, a scan could either be a Table Scan, Index Scan, or a Clustered Index Scan. Scans involve reading all or part of a table or index from start to finish. Sometimes, if the table being scanned is small, a scan is a more efficient way to read the data than using an index. The same holds true if

SQL SERVER STANDARD MAY JUNE 2007

Triggers and Cursors


Cursors are resource pigs My favorite author, Robert Vieira, says in his book Professional SQL Server 2000 Programming (WROX ISBN 1-861004-48-6): remember to avoid cursor use wherever possible. Cursors are a resource pig, and will almost always produce 100 times or worse negative performance impact ... Cursors are meant to be used when no other options are available. I dont think I can add much more to that. Dont use a cursor unless you really need to. SQL Server is designed and optimized to be set-based, and you can do almost everything you need to do in single or multiple queries. Triggers Triggers are another performance bottleneck. You can create triggers on a table, which fire when you insert, delete or update data in the table. Triggers are usually used to log updates or to maintain referential integrity. I used to work at a company where they used multiple triggers per table to maintain denormalized data, and default and status flags on tables. Whenever a single update took place, on one table in particular, four triggers would fire. Each trigger had a dozen lines of code, with multiple selects and updates. A single update or insert took ten times longer than it would have otherwise, and while the triggers were running the update held its locks on the data. You should get better performance and concurrency by putting the trigger code in stored procedures and it will definitely be easier to maintain and debug.

IN with a LEFT OUTER JOIN; 2. Remove User Defined Functions and system functions in the WHERE clause. While UDFs in the SELECT clause can provide excellent performance enhancements, a UDF in the WHERE clause can kill your query. The worst part of this is that the Estimated Execution Plan does not show the cost of UDFs in the total subtree cost. What is going to happen if you use a UDF or system function in your WHERE clause is that potentially every row is going to be passed through the function, and the result will be used to filter your data. If this equates to thousands of rows, your query is going to run for a while. You may get better performance by applying the function in a pre-select, placing the result into an indexed table variable, and using the table variable in your query; 3. Unnecessary ORDER BY can cause excessive IOs by sorting. Remove ORDER BY if you do not need it; 4. Limit the number of columns you are returning. This will reduce IO and network traffic; 5. Search queries with LIKE should be avoided where possible, especially LIKE on varchar (2000) fields, for example. Rather, set up a clustered index on the field you are searching on, or configure Full-Text Search; 6. LEFT OUTER JOIN can potentially slow down your query, especially if the field you are joining on contains many rows of NULL data. If you can use INNER JOIN, then use it instead of LEFT OUTER JOINS, or RIGHT OUTER JOINS; 7. If you are joining on a field that contains many rows of NULL data, you could see your query performing badly. Try not to join on that field.

Common pitfalls
A colleague of mine once advised me that before I run execution plans or look at runtimes of queries, to just take a look at the query code and check for some obvious mistakes. Here is a list of potential problems to look out for: 1. Remove NOT IN if the query inside the NOT IN returns a large number of rows, you are going to hit the tempdb database heavily and invoke excessive IOs. I recently saw a junior DBA write a NOT IN which grew tempdb to 20 GB, and it would have continued to run if the disk drive did not run out of disk space and crash the query. Replace your NOT

Query Performance Tuning Tools of the Trade Estimated Execution Plan


The Estimated Execution Plan is a very useful guide of the cost of your query. Note that the plan does not show some things, such as the cost of User Defined Functions. To understand the graphical plan, you need to read from Right to Left. Hover your mouse pointer over an icon, and you will be shown what that part of the query is doing. It will show the subtree cost, the type of access

10

SQL SERVER STANDARD MAY JUNE 2007

and the columns being retrieved. SQL Server 2005 Management Studio shows more detailed information than SQL2000s Query Analyser. A thick line joining icons indicates large volumes of data being moved around, and could highlight places where Scans are taking place. Hash Joins are bad, and can often be caused by a Scan at a previous step in the plan. An important part of the plan is that each icon shows the percentage of cost that the step takes up in the entire query. You can quickly see which sections of the query form the most expensive and time consuming portion, and this helps to resolve the most important issues first. To see the overall cost of the query, hover the mouse pointer over the top-left-most icon. If the subtree cost displayed is less than zero, then this query will be effective for a front-end (GUI) and web-based application requiring quick access times. Costs of between 1 and 3 are adequate, but could be tuned. A cost over 3 or 5 is potentially bad. For a back-end process, I have seen costs over 30 or 80 for search or reporting stored procedures. The real-time runtime of the query depends on the hardware it runs on, so a cost of 5 on one server could run for the same time that a lower cost query runs on a slower server. It all depends trial and error is the name of the game here.

a great tool, but it seems to have some bugs. When running the tool against a SQL2000 database I get some error messages, but it still seems to do the job correctly. When run against a large trace file, the tool doesnt work at all. However, the DTA tool will take a query or short profiler trace and quickly recommend indexes and statistics to improve your query. The quickest way to use the tool is to open your query or EXEC statement in SQL Server 2005 Management Studio (SSMS), highlight the query, right-click and select Analyze Query in Database Engine Tuning Advisor. Once the tool opens, select your database in the Database for Workload Analysis drop-down, tick your database in the list of available databases, and click Start Analysis. When analysis has completed, you can select the recommendations one after the other and copy them to the clipboard. Recommendations will be indexes or statistics you can add to improve the query. The trick here is to run the recommendations one after the other, and to check the Estimated Execution Plan between each recommendation. Some recommendations make no difference to the cost of the query and can be ignored.

Figure 2: Database Tuning Advisor

SET SHOWPLAN
Figure 1: Estimated Execution Plan The SET SHOWPLAN_ALL ON is an alternative to the Estimated Execution Plan. To use the command, enter it in the Query Analyzer or SSMS query window just before your query. When you run the commands, your query is not actually run. Instead, you are presented with a detailed analysis of how the statements will be executed. I like to cut and paste the results into Excel, as I find it easier to navigate than the Query Results window.

Database Engine Tuning Advisor for SQL Server 2005


The Database Tuning Advisor (DTA) in SQL Server 2005 is

11

SQL SERVER STANDARD MAY JUNE 2007

Once the results are in Excel, you can easily see the same information as the Estimated Execution Plan, but in text format. I sometimes find SHOWPLAN easier to work through for complicated queries than the graphical Estimated Execution Plan.

INPUTBUFFER (spid) will show you the command being executed by the spid. I am not going to go into too much detail here. What I will say is that blocking is sometimes an indication of a performance problem on your system. If your system is not letting queries through fast enough and they start blocking, then you need to treat the cause. Blocking is generally just the symptom, but it is a good indication that something is going wrong somewhere.

Profiler
The SQL Profiler is a great tool for tracing your database and for seeing what queries are actually running. You can filter the trace on database, user, and a number of other criteria. There are a variety of different events and columns you can log, but generally the default settings are good enough. If you are looking for the worst queries, experiment with filtering on CPU, Duration and Reads. A colleague of mine uses CPU, but I generally filter on Duration greater than 1 second. This is not always indicative of a problem, as a query that runs for less than one second on the test server could run for 5 seconds on the production server under heavy load, or 20 seconds under abnormal load. If you experiment with the settings, you will be able to identify the queries causing the most load on your system. These are the queries you should be tuning.

Figure 3: Showplan results

SET STATISTICS
The SET STATISTICS IO ON command works in a similar fashion to the previous command described, except that in this case your query is actually run. This command shows the disk activity taking place when your query is run. SET STATISTICS TIME ON is a command which shows the time required to parse, compile and execute your query.

Figure 4: Statistics IO results

Monitoring Locks
An article on performance tuning is never complete without a discussion on locks and blocks. Generally speaking, locks are good and very necessary, but blocks are usually bad. Run the command select * from master..sysprocesses where blocked > 0 and spid <> blocked to see if blocks are taking place, and use sp_who2 to see more information about the spid. DBCC Figure 5: Profiler

Conclusion
Sometimes when you have identified a performance problem, you just need more powerful hardware. I

12

SQL SERVER STANDARD MAY JUNE 2007

would take a step back first and take a look at the Big Three discussed previously, as well as your queries. If you can tune your worst queries by 10%, then you have been able to improve the overall performance of your system. Each query that performs badly impacts negatively on every other query running at the same time whether that query is part of your application or another application running on the same server. Query tuning works on the 80/20 rule. 80% of the work is used to tune 20% of the code, but once you have done that 20% your systems should be running better than ever before. Lastly, check out the article SQL Server 2005 Waits and Queues this is my bible for performance tuning. You will find the link under the References section.

References
Professional Sql Server 2000 Programming by Robert Vieira (WROX ISBN 1-861004-48-6) Microsoft SQL Server 2000 Performance Tuning Technical Reference by Edward Whalen, Marcilina Garcia , Steve Adrien DeLuca, and Dean Thompson (Microsoft Press - ISBN 0-7356-1270-6) SQL Server 2005 Waits and Queues by Tom Davidson; Updated by: Danny Tambs; Technical Reviewer: Sanjay Mishra The above article can be found at: http://www.microsoft.com/technet/prodtechnol/sql/be stpractice/performance_tuning_waits_queues.mspx

Wayne Fillis passion for computers started twenty years ago at the age of 14, when his parents bought him a ZX Spectrum. Running an impressive (in those days) 48k RAM, he used the computer to write games (what else does a 14 year old want to do beside play games)? After graduating with a computing diploma he started work as a DB2 Mainframe Cobol programmer at a large Insurance company, eventually moving to Visual Basic, PC Cobol and SQL Server 2000. He learned to love the power of set-based operations and was able to move his focus solely to SQL Server in 2004. He is passionate about technology and continues to enhance his knowledge and skills in whatever way possible.

13

USING T-SQL CODEGEN FOR AUDIT TRAILS


BY: Paul Nielsen

From time to time over the years Ive obsessed on writing better audit trails. Maybe its my history as a Data Systems Tech in the Navy, maybe I just like having proof that the database worked as advertised, but seeing the full history for any row makes good sense to me. There are several types of audit trails. The most basic uses a copy of the transactional table and simply stores the last values. A better audit trail writes a full history from the creation to the deletion of the row. Ive also seen systems that keep a full history inside the transactional table by never updating a row; they just insert a new version of the row with the current values and flag that row as the active row.

easy to view the audit history, a nightly process could flush the Audit table to the Audit History table that includes non-clustered indexes and would speed retrieval.

Audit Strategies
The strategic question is what data operations to audit? Updates should certainly be audited, but inserts and deletes raise a question. It can be argued that Insert operations dont need to be audited because the original data can be viewed in the base tables, and updates are captured by the audit table. Not auditing inserts significantly reduces the size and performance load of auditing. The advantage of auditing insert operations is that it captures the full pedagogy, or source, of the data. Without auditing insert operations, it becomes necessary to store user creation metadata in the base table rows. There are two options for auditing delete operations writing a simple delete operation flag to the audit table and relying on the last insert or update audit to recreate the deleted data, or storing a verbose snapshot of the deleted row in the audit table. The first option, writing just a deleted timestamp and operation to the audit table, is an elegant solution. However it carries a risk if the audit table contains less than a complete picture of the data. Not auditing insert operations beginning with the tables first row means that the delete flag option is insufficient. Because most databases see more inserts than deletes, and its highly likely that the audit system will be applied to databases already in production, it makes sense to me to not audit insert operations, and to store a verbose record of delete operations. For the sake of options,
There are several types of audit trails. The most basic uses a copy of the transactional table and simply stores the last values.

The Audit Table


I prefer a separate audit table that can maintain a full history for every table. Of course, auditing blobs like VarChar(max), image, xml, and text columns is a problem for a single table solution.
IF Object_id(Audit) IS NULL CREATE TABLE dbo.Audit ( AuditID BIGINT NOT NULL IDENTITY PRIMARY KEY CLUSTERED, AuditDate DATETIME NOT NULL, HostName SYSNAME NOT NULL, SysUser VARCHAR(50) NOT NULL, Application VARCHAR(50) NOT NULL, TableName SYSNAME NOT NULL, Operation CHAR(1) NOT NULL, u,d PrimaryKey VARCHAR(20) NOT NULL, edit to suite RowDescription VARCHAR(50) NULL, Optional SecondaryRow VARCHAR(50) NULL, Optional ColumnName SYSNAME NULL, required for i,u not for D, should add check constraint OldValue VARCHAR(50) NULL, edit to suite (Nvarchar() ?, varchar(MAX) ? ) NewValue VARCHAR(50) NULL edit to suite (Nvarchar() ?, varchar(MAX) ? ) ) optimzed for inserts, no non-clustered indexes

There are non-clustered indexes on the audit table to optimize the table for inserts. If the application makes it

14

SQL SERVER STANDARD MAY JUNE 2007

however, Ill include the insert operation trigger and both types of delete in the sample code.

Creating the Triggers


With the audit table in place, the next step is to program the triggers to log the insert, update, and delete transactions into the table. Its important that the audit trail is handled by triggers and not a call in a stored procedure (for example, the pAddNewCustomer stored procedures might call the pWriteAudit stored procedure and pass to it the new customer data to be stored in the audit trail). I was recently hired to diagnose a database that was experiencing data integrity issues. The database had an audit trail but it was maintained by stored procedures instead of triggers. Unfortunately, some stored procedures omitted the call to pWriteAudit. And guess which stored procedures had issues that needed diagnosing? Ive developed large databases with full audit capability and I can tell you that manually writing audit trail triggers for every table is a royal pain. One huge lesson that I learned the hard way is that if the audit trail is built before the data schema is rock solid, the work required to maintain the audit trail is just not worth it. For one project I think I wasted half of my time just keeping the audit triggers in synch with the evolution of the project. There has to be a better way. Thats why I developed the T-SQL dynamic audit trigger in 2001 and wrote about it in the SQL Server 2000 Bible. Its a simple system that uses a minimalist trigger to pass some data to a common stored procedure (table name, PK, inserted and deleted tables in temp tables, etc). The stored procedures then examines the bits in from the Updated() function and generates inserts to the audit table for every affected column. Very Cool. Very Slow. But, its easier than hand writing a fixed audit trail for every table. To summarize the problem space: fixed, hand-coded audit triggers are specific to the table; lean, and fast but theyre tedious to write. Dynamic triggers are easy to apply, but costly for performance. Wouldnt it be cool to run a stored procedure that examined the tables metadata and used code-generation to build a perfect fixed audit trail trigger? When the tables schema changes, you could simply rerun

the procedure and easily keep the audit trail triggers up-to-date. You get the speed of the fixed audit trigger without the pain. Also, code-gened code is terribly consistent. Ive been meaning to write this for about a year since I wanted to include it in the SQL Server 2005 Bible, but just didnt have the time. Four events coalesced for me recently regarding audit trail triggers. First, I saw a blog entry stating that the only way to build a dynamic audit trail trigger was to use CLR. UGHGG!?!?! 2) The client I mentioned really needs a good audit trail and I dont want to write the code by hand. 3) A reader wrote asking for more information and advice on implementing the T-SQL dynamic audit trail trigger that I wrote about in the SQL Server 2000 Bible. I wanted to give him a better solution than my old dynamic trigger. And 4) Im starting to plan the next edition of SQL Server Bible and on the list of new chapters was this idea for a better audit trail method.

AutoAudit
So, the AutoAudit stored procedure accepts a schema and table name and then it adds created and modified columns to the table and code generates audit triggers for the table. The stored procedure also creates the audit table (if its not already there). The AutoAuditAll stored procedure simply calls the AutoAudit stored procedure for every table in the database (except, of course, the audit table). You can download the AutoAudit scripts from www.SQLServerBible.com. The main script includes all the stored procedures, and a test script executes AutoAudit against a couple of tables in AdventureWorks. I have to point out that AdventureWorks tables include a column called ModifiedDate. Running AutoAudit on AdventureWorks means that it has two audit systems in place. Since AutoAudit automatically audits every base table column (with certain data type exceptions), youll see the ModifiedDate column in the triggers even though its not a smart idea to audit changes to the column used to audit last modified date. If AdventureWorks was a production database, the fix would be to manually edit the trigger and remove the extra code. At the time of this writing, AutoAudit is up to version 1.07. This version is limited to tables with single column primary keys, but Im working a version that will handle

15

SQL SERVER STANDARD MAY JUNE 2007

composite primary keys that may be complete by the time you read this. So heres what a code-generated trigger looks like. This trigger was generated for the Production.Culture table in AdventureWorks. I chose this table to save space in the article because it has only 3 columns.
ALTER TRIGGER [Production].[Culture_Audit_Update] ON [Production].[Culture] AFTER Update NOT FOR REPLICATION AS SET NoCount On generated by AutoAudit on Feb 6 2007 2:09PM created by Paul Nielsen www.SQLServerBible.com DECLARE @AuditTime DATETIME SET @AuditTime = GetDate() Begin Try IF UPDATE([CultureID]) INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), Production.Culture, u, Inserted.[CultureID], NULL, Row Description (e.g. Order Number) NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line) [CultureID], Cast(Deleted.[CultureID] as VARCHAR(50)), Cast(Inserted.[CultureID] as VARCHAR(50)) FROM Inserted JOIN Deleted ON Inserted.[CultureID] = Deleted.[CultureID] AND isnull(Inserted.[CultureID],) <> isnull(Deleted.[CultureID],) IF UPDATE([Name]) INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), Production.Culture, u, Inserted.[CultureID], NULL, Row Description (e.g. Order Number) NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line) [Name], Cast(Deleted.[Name] as VARCHAR(50)), Cast(Inserted.[Name] as VARCHAR(50)) FROM Inserted JOIN Deleted ON Inserted.[CultureID] = Deleted.[CultureID] AND isnull(Inserted.[Name],) <> isnull(Deleted.[Name],) IF UPDATE([ModifiedDate]) INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), Production.Culture, u, Inserted.[CultureID], NULL, Row Description (e.g. Order Number) NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line)

[ModifiedDate], Cast(Deleted.[ModifiedDate] as VARCHAR(50)), Cast(Inserted.[ModifiedDate] as VARCHAR(50)) FROM Inserted JOIN Deleted ON Inserted.[CultureID] = Deleted.[CultureID] AND isnull(Inserted.[ModifiedDate],) <> isnull(Deleted.[ModifiedDate],) End Try Begin Catch Raiserror(error in [Production].[Culture_audit_update] trigger, 16, 1 ) with log End Catch The AutoAudit stored procedure does quite a bit; heres the section of the code that generates the update trigger. The first SET command starts to build the @SQL variable by setting it to the static opening portion of the trigger. SET @SQL = CREATE TRIGGER + @SchemaName + . + @TableName + _Audit_Update + ON + @SchemaName + . + @TableName + Char(13) + Char(10) + AFTER Update + Char(13) + Char(10) + NOT FOR REPLICATION AS + Char(13) + Char(10) + SET NoCount On + Char(13) + Char(10) + generated by AutoAudit on + Convert(VARCHAR(30), GetDate(),100) + Char(13) + Char(10) + created by Paul Nielsen + Char(13) + Char(10) + www.SQLServerBible.com + Char(13) + Char(10) + Char(13) + Char(10) + DECLARE @AuditTime DATETIME + Char(13) + Char(10) + SET @AuditTime = GetDate() + Char(13) + Char(10) + Char(13) + Char(10) + Begin Try + Char(13) + Char(10) for each column select @SQL = @SQL + + IF UPDATE([ + c.name + ]) + Char(13) + Char(10) + INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) + Char(13) + Char(10) + SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), + + @SchemaName + . + @TableName + , u, + Inserted.[ + @PKColumnName + ], + Char(13) + Char(10) + NULL, Row Description (e.g. Order Number) + Char(13) + Char(10) + NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line) + Char(13) + Char(10) + [ + c.name+ ], + Cast(Deleted.[ + c.name + ] as VARCHAR(50)), + Cast(Inserted.[ + c.name + ] as VARCHAR(50)) + Char(13) + Char(10) + FROM Inserted + Char(13) + Char(10) + JOIN Deleted + Char(13) + Char(10) + ON Inserted.[ + @PKColumnName + ] = Deleted.[ + @PKColumnName + ] + Char(13) + Char(10) + AND isnull(Inserted.[ + c.name + ],) <> isnull(Deleted.[ + c.name + ],) + Char(13) + Char(10)+ Char(13) + Char(10) from sys.tables as t join sys.columns as c on t.object_id = c.object_id join sys.schemas as s on s.schema_id = t.schema_id

16

SQL SERVER STANDARD MAY JUNE 2007

join sys.types as ty on ty.user_type_id = c.user_type_id join sys.types st on ty.system_type_id = st.user_type_id where t.name = @TableName AND s.name = @SchemaName AND c.name NOT IN (created, modified,RowVersion) AND c.is_computed = 0 AND st.name IN (tinyint, smallint, int, money, smallmoney, decimal, bigint, datetime, smalldateteime, numeric, varchar, nvarchar, char, nchar, bit) order by c.column_id select @SQL = @SQL + + End Try + Char(13) + Char(10) + Begin Catch + Char(13) + Char(10) + Raiserror(error in [ + + @SchemaName + ].[ + @TableName +_audit_update] trigger, 16, 1 ) with log + Char(13) + Char(10) + End Catch EXEC (@SQL)

@SQL. The final part of the code concatenates the conclusion of the trigger. Once @SQL is fully defined, a simple EXEC (@SQL) runs the code and creates the trigger. So, I invite you to download the AutoAudit script and try it for yourself. Admittedly, telling your friends that youre using T-SQL for code generation will get you some strange looks. But its worked well in this situation and AutoAudit easily creates consistent, fast fixed, audit trail triggers. And if you do use it, let me know how it works for you.

The second section of the code uses the multiple assignment variable technique to append the columndependent portion of the trigger. The select simply finds the column names for the table by referencing the sys.columns table. Each row (representing a column in the table) by the select is concatenated with the rest of the code needed for the trigger and the appended to

Paul Nielsen is the PASS Director of Global Community Development and author of SQL Server 2005 Bible and Total Trainings SQL Server Development video. His website is www.SQLServerBible.com and you can meet him at SQLTeach in Montreal where hes speaking about Nordic (O/R dbms for SQL Server) and giving the pre-con on Database Design and Optimization Best Practices. He also leads Smart Database Design Seminars around the country.

Offer expires July 31, 2007. Valid for addresses in continental US only. Quantities are limited, and other restrictions may apply. See our site for full details.

17

PERFORMANCE TUNING USING SQL PROFILER


BY: K. Brian Kelley

Performance Monitor, a free tool with the Microsoft Windows operating systems, is an invaluable tool when locating performance bottlenecks. While this may be the primary tool of the system administrator and a valuable resource for the SQL Server DBA, there are other tools a SQL Server DBA should be aware of which go beyond hardware bottlenecks. Being able to see query execution plans with Query Analyzer, SQL Server Management Studio, or our favorite 3rd party product can be equally useful to tune poorly performing queries. Traces, however, whether server-side or through a tool like SQL Profiler, can be the most important tool of all. This article introduces the use of SQL Profiler and serverside traces for performance monitoring/tuning.

adage - to put the most effort into where well get the biggest impact. A query may take a while to run but if it only runs once a day while another query runs relatively quickly, but thousands of times an hour, we may improve performance more by concentrating on that second query.

Long Running Queries


One of the first things for which most DBAs use SQL Profiler is to capture what queries are running too long on SQL Server. These queries can often point us to where we need to tune our databases. For instance, Figure 1 shows SQL Profiler (SQL Server 2000 version) being used with the SQLProfilerTSQL_Duration trace template. Ive also specified a filter which only shows those SQL batches which last over 500 milliseconds.

Figure 2: SQL Profiler Trace of All Queries Figure 2 shows a trace which uses the SQLProfilerTSQL trace template. What is shown is when a batch starts and what that batch contains. We now know what queries are being run and how often they occur. Again, this is all very useful information. However, were not done with using Profiler to research performance issues. In my experience, one of the biggest banes to performance has been where locks on database objects prevent other data operations from being carried out. Profiler can help us see these issues, too, but first we must be able to construct our own Profiler trace templates.

Building Your Own Trace Template


Figure 1: SQL Profiler Trace for Long Running Queries

Queries Which Run Repeatedly


While finding long running queries is important, we can use SQL Server Profiler for much more. Theres an old

Ive used two different trace templates to show two different things: long running queries and all queries which do run. While each of these templates records information of value, neither of them (or any of the other prepackaged trace templates) may capture the right mix

18

SQL SERVER STANDARD MAY JUNE 2007

of events and the appropriate columns for those events for your purposes. All is not lost, as SQL Profiler gives us the ability to build our own trace templates. There are two ways to approach building your own trace template. You can start from scratch with nothing selected or you can begin by modifying an existing template. Once youve built your trace template, you can save it to be re-used again. Creating a trace template is different between SQL Profiler for SQL Server 2000 and for SQL Server 2005. For SQL Server 2000, to start one from scratch, begin with File | New | Trace Template. To copy from an existing one, start with File | Open | Trace Template. Be sure to click the Save As button before starting your edits, however. When working with SQL Server 2005, start with File | Templates and then, depending on whether you want to start from scratch or start from an existing template, chose New Template or Edit Template, respectively. Again, if you choose Edit Template and you arent editing one of your own, click the Save As button on the General tab.

what our blocking problems are being caused by. We can also add Lock: Acquired, but chances are likely well receive too much information, even if we filter for the specific object, such as by using the ObjectID filter. Figure 3 shows just such an example with a single query against a table.

Figure 3: Lock:Acquired Events firing from Querying a Single Table Therefore, monitoring Lock:Acquired is of very limited value unless we can carefully control whats being executed on the SQL Server. While SQL Profiler is good for profiling deadlocks and lock timeouts, monitoring the locks themselves is probably better done as described in Microsoft KB article 271509, How to Monitor Blocking in SQL Server 2005 and in SQL Server 2000. This article describes using a stored procedure to monitor and report blocking periodically (sp_blocker_pss80), taking snapshots in time. While the information to parse through provided by this method can still be substantial, it is far smaller than if we tried to monitor individual locking with SQL Profiler or a server-side trace. Combining a SQL Profiler trace focusing on deadlocks and lock timeouts along with the steps in 271509 can help identify the source of blocking issues quickly. Figure 4 shows just a small excerpt of the sp_blocker_pss80 stored procedure when a blocking situation is captured:

Looking for Blocking and Deadlocking


Consider the following scenario: an end user issues a SELECT * FROM SomeTable against a large table used for transactional purposes. This SELECT statement puts a shared table lock over the whole table which, while itll allow other SELECT operations, stops INSERT, UPDATE and DELETE statements in their tracks until the SELECT statement completes (this assumes the default of pessimistic concurrency control). SQL Profiler can help diagnose these types of blocking issues as well as deadlock issues. A deadlock is when two SPIDs have locks on a particular resource that the other SPID needs. Since neither operation can go forward, SQL Server will roll back one of them. Needless to say, this can cause unpredictable results for your applications. In order to track blocking and deadlocking, we need to create our own custom trace template. Key events were looking for are: Lock:Deadlock, Lock: Deadlock Chain, and Lock: Timeout. If were using SQL Server 2005, we have the option of adding Lock: Deadlock Graph (which produces an XML output of the deadlock) and Lock: Timeout (timeout > 0) for those lock timeouts where the timeout is greater than 0. The Lock: Timeout is good in conjunction with SQL:StmtStarting because we can associate what statements are seeing lock timeouts (using SPID to tie the two events together), which gives us an idea of what objects were having blocking issues on. This allows us to backtrack and see what statements are using those objects and holding the locks, which tells us

Figure 4: Output from sp_blocker_pss80

19

SQL SERVER STANDARD MAY JUNE 2007

As you can see, the script even tells us what queries are causing the blocking. Compare this with the results from SQL Server Profiler in Figure 5:

Profiler from SQL Server 2000 or 2005. Of course, the SQL Server 2005 events are available only with the SQL Profiler available from SQL Server 2005. If we try to connect with the SQL Profiler from SQL Server 2000, we get an error. If we are connecting to a SQL Server 2000 server, in order to obtain execution plan (show plan) information we must include the BinaryData column in order to get anything back. However, in doing so, the data is stored in a format which is unusuable outside of SQL Profiler. If we convert it to a trace table the BinaryData column is typed as an image column and so far as I am aware, Microsoft has not published publicly how to translate this information into a usable form. This is where the SQL Server 2005 options with XML are a huge boon. For instance, using the Showplan XML Statistics Profiler allows us to see the execution plan along with the values related to cost, etc. Since the results are in XML we can take this information and transform it as we need to in order to evaluate the various execution plans as they occurred. We can also take the whole conglomerate of execution plans and look for key objects which would potentially indicate poor performance, such as table or index scans. By collecting all the results and filtering through the data in this fashion, were not stuck going through each potential query one-byone to see where the execution plans indicate performance tuning is needed.

Figure 5: Blocking in SQL Server Profiler As the results show, unless were dealing with a deadlock situation or the client has set a lock timeout value (by default this value isnt set), were not going to see as much information in SQL Server Profiler as we would with the stored procedure and methodology provided in 271509.

Tracking Execution Plans


One thing SQL Profiler or a server-side trace can provide is the execution plans which were used at the time a query was executed. In order to do so, we have to create a custom trace and specify one of the following Events: SQL Server 2000: Execution Plan Show Plan All Show Plan Statistics Show Plan Text

Converting SQL Profiler Traces to Server Side Traces

One of the hardest things to learn how to do is take the settings in SQL Profiler for events and columns and write a server-side trace that does the same thing. There are certainly benefits to a server-side trace. First and foremost, we dont have to have a client actively running to capture such information. Second, were not going to SQL Server 2005: miss events because of too much activity on the server. Showplan All We can configure Server processes SQL Server trace Showplan All For Query Compile data (SQL Server 2000) or Server processes trace data Showplan Statistics Profile (SQL Server 2005) which forces the Showplan Text trace server side. Without this setting Showplan Text (Unencoded) One of the first things for which events may not be passed to the SQL Showplan XML most DBAs use SQL Profiler is to Profiler client when the SQL Server is Showplan XML For Query Compile capture what queries are under heavy stress. This setting forces Showplan XML Statistics Profile running too long on SQL Server. the trace handling back to the server, and if thats whats required we might In the case of the SQL Server 2000, as well build a server-side trace, in any case. In actualithese are the available events, whether using the SQL ty, this is what SQL Profiler is doing, except it has a mech-

20

SQL SERVER STANDARD MAY JUNE 2007

anism to get the information from SQL Server to be able to display it visually. Extracting a server-side trace isnt all that difficult because we can let SQL Profiler do most of the work for us. Once we get a trace set up just the way we want it, we can export the trace to a SQL script. This script can be the run on SQL Server to setup the trace so that we dont have to have Profiler up and running. In SQL Server 2000s version of SQL Profiler, this can be accomplished by File | Script Trace | and either For SQL Server 2000 or SQL Server 7.0. With SQL Server 2005s version, we can script a trace by File | Export | Script Trace Definition | and either For SQL Server 2000 or For SQL Server 2005. Here is an excerpt from a trace definition of the Standard template (SQL Server 2005):
/****************************************************/ /* Created by: SQL Server Profiler 2005 */ /* Date: 04/22/2007 10:56:08 PM */ /****************************************************/ Create a Queue declare @rc int declare @TraceID int declare @maxfilesize bigint set @maxfilesize = 5 Please replace the text InsertFileNameHere, with an appropriate filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc extension will be appended to the filename automatically. If you are writing from remote server to local drive, please use UNC path and make sure server has write access to your network share exec @rc = sp_trace_create @TraceID output, 0, NInsertFileNameHere, @maxfilesize, NULL if (@rc != 0) goto error Client side File and Table cannot be scripted Set the events declare @on bit set @on = 1 exec sp_trace_setevent @TraceID, 14, 1, @on exec sp_trace_setevent @TraceID, 14, 9, @on

Concluding Thoughts
SQL Profiler is an excellent tool for helping to diagnose performance issues with our SQL Servers. We can go a step further to run server-side traces, eliminating the need for this client tool to be up and open all the time. This article touches on how to begin using SQL Profiler and traces to help in our performance tuning, but due to the flexibility of traces in general theres a lot more I didnt cover. If youre new to using this tool, I suggest breaking it out in a development or test environment and trying to cause the types of performance issues DBAs are called on to diagnose. Take the time to get SQL Profiler set up to show those events and get accustomed with how the results will appear. If you support both SQL Server 2000 and SQL Server 2005, spend some time with both versions of SQL Profiler, as there are some substantial differences between the capabilities of the two tools based on how the SQL Server database engines are instrumented in the two versions. Finally, when you are comfortable with using SQL Profiler, use it to generate your first few sets of server-side traces. As you get more comfortable with the stored procedures and functions related to traces, you may find you wont need SQL Profiler very often, except to view data. However, in a crunch, remember how to let SQL Profiler generate the guts of a trace for you. It can be a great time saver.

As you can see, this trace definition includes the appropriate variable declarations that are needed for the trace, the actual commands to be run, as well as guidance on what we need to change in order to customize the script for our purposes.

Brian Kelley is a Systems Architect and DBA with AgFirst Farm Credit Bank and the regular security columnist for SQLServerCentral.com. He is also the author of Start to Finish Guide to SQL Server Performance Monitoring and the President of the Midlands PASS Chapter for South Carolina. You can contact him at brian_kelley@sqlpass.org.

21

UNDERSTANDING QUERY EXECUTION PLANS TO OPTIMIZE PERFORMANCE


BY: Scott Klein
One of the most difficult tasks that faces a T-SQL developer is that of tuning their queries for optimal performance. A poorly performing query can not only seriously degrade the operations on your SQL Server, the users that are waiting on the data being returned by the query wont be too happy either. Many times DBAs will address poor query performance by addressing and tuning physical SQL Server aspects, such as adding memory or processors. In some cases this will work but does not address the real underlying problem. In most cases a look at the query and related components is what is needed. Fortunately, Microsoft provides a number of tools to help address the issue of poor-performing queries. This article will take a look at a few of those, but primarily focus on one of them: the execution plan generated by SQL Server when a query is run and how to read and understand the information it provides. Granted, entire books can, and probably are, written on this topic, but the intent of this article is to get you started on knowing and understanding the tools available to you to help you properly tune your queries. When a SHOWPLAN option is set ON, SQL Server does not actually execute the query, but instead analyzes the query and returns specific and detailed query execution information. Query execution information will continue to be returned until the SHOWPLAN option is set OFF. Setting a SHOWPLAN option ON turns it on for the connection, not just the current T-SQL statement. Be sure to turn it off, as shown in the example above. The following discusses the three available SHOWPLAN options.

SET SHOWPLAN_TEXT
The SHOWPLAN_TEXT option returns detailed execution information about the T-SQL query statement, in row format with a single column but multiple rows. The rows returned are in tree-style, hierarchical format, each row in the row set detailing the steps in the execution plan which was, or will be, performed by SQL Server. Figure 1 below shows the query execution information using the SHOWPLAN_TEXT option. In the top pane, the SHOWPLAN_TEXT option is set ON, followed by the query, then the SHOWPLAN_TEXT option being set OFF. Notice in Figure 1 that the query results are not being returned as mentioned earlier, but what you do see is a single column with four rows (each row beginning with |) which contains the query execution information.

SET SHOWPLAN Options


The SET SHOWPLAN options provide vital information regarding how SQL Server processed (or will process) the query. The SET SHOWPLAN option looks at each table in the query and provides helpful feedback such as the indexes used and the order of execution pertaining to the operation options. The SET SHOWPLAN option has the ability to return results in three different formats: plain text, XML, and graphical. SHOWPLAN options are enabled by setting the option ON and disabled by setting the option to OFF. For example:
SET SHOWPLAN_TEXT ON GO /* EXECUTE A QUERY */ SET SHOWPLAN_TEXT OFF GO

Figure 1

22

SQL SERVER STANDARD MAY JUNE 2007

With the SHOWPLAN_TEXT option, each node in the output tree is an individual step that the SQL Server query processor has taken (or will take) for each query step. So how does this read? This type of output is read rightto-left, top down. Thus, the operators that are most indented produce rows consumed by its parent operator, and so on all the way up the tree. In the example above, the two most inner nodes are at the same level because they are the product of a join, with the upper node being the outer table and the lower node being the inner table. The upper node is executed first with the lower node being executed over and over for each row trying to match rows to the upper node.

The SHOWPLAN_ALL option provides much more information and is easier to read. For example, notice the Node and Parent columns in Figure 2. The Node column is the ID of the node in the current query, while the Parent column is the Node ID of the parent step, meaning, hierarchically which node belongs to which parent node. This makes reading the output much easier. The other columns are defined as follows: PhysicalOp - For rows of type PLAN_ROWS (see definition of Type column below), this column contains the type of physical implementation of the node, such as Sort, Nested Loops, etc. LogicalOp - Also for rows of type PLAN_ROWS, this column contains the type of relational operator of this node, such as Sort, Inner Join, etc. Argument - Contains additional information about the type of operation being performed, and are based on the type of physical operator (the PhysicalOp column). DefinedValues - Lists the values added by this operator. In this example, the values include the column names returned by the query (in other words, the values returned by the listed columns). EstimateRows - This column contains the estimated number of rows that this operation will produce. EstimateIO - Contains the I/O cost (estimated) for this operation. Obviously, this value should be as low as you can get it. EstimateCPU - Contains the CPU cost (estimated) for this operation. AvgRowSize - Contains the average size (estimated) of the rows being returned by the current operator. TotalSubtreeCost - Contains the total cost of the current operation and its child operations. OutputList - Lists the columns being protected by the current operation. Type - Contains the Node type. EstimateExecutions - Contains the number of times (estimated) that this operator will be executed during the currently running query.

SET SHOWPLAN_ALL
The SHOWPLAN_ALL option is very similar to the SHOWPLAN_TEXT option as to the manner of output format but differs in that the SHOWPLAN_ALL option returns more detailed information that that of the SHOWPLAN_TEXT option. While the SHOWPLAN_TEXT option returned a single column, the SHOWPLAN_ALL option returns seventeen additional columns that help better explain the execution output. Figure 2 below shows the same query using the SHOWPLAN_ALL option. For better readability in this example, the results were returned to a grid rather than text.

Figure 2 Figure 3 below shows most of the other columns provided by the SHOWPLAN_ALL option.

SET SHOWPLAN_XML
It should be stated that for now, the SHOWPLAN_TEXT and SHOWPLAN_ALL options are available and work. However, in future versions of SQL Server both the SHOWPLAN_TEXT and SHOWPLAN_ALL options will be

Figure 3

23

SQL SERVER STANDARD MAY JUNE 2007

removed. Microsoft suggests that you start using the SHOWPLAN_XML option as soon as possible so that you can start getting familiar with this option. The SHOWPLAN_XML option returns the query execution information nicely formatted as an XML document and contains the exact same information as the SHOWPLAN_ALL option. In Figure 4 below, the top pane shows the utilization of the SHOWPLAN_XML option using the same query as the previous examples, while the lower pane shows the execution results.

you are able to glean from the returned plan output. One last comment on the SHOWPLAN options. You need to have proper permissions to use and set SHOWPLAN options, and have sufficient permissions on the objects that the SHOWPLAN will access and execute.

Missing Index Feature


The missing index feature is probably the most under utilized performance tuning feature of SQL Server, probably because it is not well known. And the fact that is it buried a few layers deep in BOL (Books Online) doesnt help. Yet, this feature can come in very handy and be very beneficial when looking at query tuning. When a query is executed, the query optimizer generates a query plan and analyzes the query from many aspects. One of the things the optimizer looks at is the indexes on the tables included in the query, and decides on the best indexes for a given filter criteria. The optimizer then looks at the existing indexes for the given tables, and compares the indexes on the tables to those indexes that it thinks the tables should have. If there is a discrepancy, meaning if there are indexes that the optimizer thinks that tables should have that dont exist on the tables, two things happen. First, the missing index information is stored. Second, a less-than-optimal performance plan is generated and used for the query. Here is where you as a T-SQL developer / DBA come in. The query optimizer stores this information for you to go look at and decide if the indexes that the query optimizer suggested are worth implementing. So, where is this missing index information stored? SQL Server stores the missing index information in the following 4 dynamic management objects: sys.dm_db_missing_index_group_stats - Contains the summary information regarding the missing index groups. sys.dm_db_missing_index_groups - Contains information regarding groups of missing indexes. sys.dm_db_missing_index_details - Contains detailed information about a specific missing index. sys.dm_db_missing_index_columns - Contains information about the columns of a table which are missing indexes.

Figure 4 Obviously the execution results returned in Figure 4 above are hard to read, but simply clicking on the link in the Results window displays a new query window with the entire XML document nicely formatted, as shown below in Figure 5.

Figure 5 Granted, reading SHOWPLAN output is not the easiest thing in the world, but after a while it will become second nature and youll be surprised at the information

24

SQL SERVER STANDARD MAY JUNE 2007

You should be able to tell by these descriptions that these tables contain vital information. It should be noted that these tables are updated when a query is optimized by the query optimizer, not every time a query is run. A nice feature about these tables is that they are state consistent, meaning that if a query is stopped during execution, or a transaction is rolled back, the missing index information will remain. Sweet. These tables keep missing index information for the current instance of SQL Server with no way of resetting the data in the tables, such as deleting the current data. The only way to reset (remove) the data from the missing index system tables is to restart SQL Server. Restarting SQL Server will drop all data from the missing index system tables. Also, The missing index feature is enabled by default and there is no mechanism to disable this feature while SQL Server is running. This feature can be disabled by stopping SQL Server then restarting SQL Server with the -x argument. As a note, however, this feature does have its limitations. For example, it does not gather information for more than 500 index groups. Nor does it specify any ordering of columns that need to be included in an index. Also, the information regarding the columns on which the indexes are missing is minimal. However, given these (and a few more) limitations, the missing indexes feature is still extremely valuable.

button, is the Display Estimated Execution Plan button. When this button is pressed, the query is evaluated and the estimated execution plan is displayed.
SELECT P.ProductName, P.UnitsInStock, S.CompanyName FROM Products AS P INNER JOIN Suppliers AS S ON P.SupplierID = S.SupplierID WHERE P.UnitPrice > 20.00 AND P.Discontinued = 0 ORDER BY P.ProductName

Using the same query as the previous sections (shown above), the query execution plan is graphically displayed on the Execution Plan tab when the query is executed in the query window. This information is the exact same information discussed previously in this article, but is nicely laid out using icons to represent each specific execution statement. The flow is easily readable via the arrows (again, read right-to-left) as well as additional information included with each icon such as the Cost and plan type. You can see in Figure 6 that the information shown is much easier to read than the output of the SHOWPLAN options.

Figure 6 Additionally, by moving your mouse over each icon in the execution plan, a window pops up detailing the information of that operation, as shown in Figure 7. The information in the figure on the next page contains the same information that is listed earlier in this article under the SHOWPLAN_ALL section, plus some additional information. Yet, you will agree that the window below is much easier to read. Based on the operation type, the information in the popup window will change slightly. This view also contains a few more pieces of information that is not included in the SHOWPLAN_ALL option. The Estimated Operator Cost contains the cost on the query optimizer when the query operation is executed. Your goal when writing a query is to try and get this number as low as possible. By getting the individual

Understanding Execution Plans


The last topic to be discussed in this article is the visual components of the execution plans. The previous topics have focused on those features that help developers improve query performance. While the information they provide is excellent, they can, however, be hard to read and it takes valuable time to learn how to properly understand the output. Luckily, Microsoft included a graphical representation of this same information which minimizes the amount of time and effort to read the execution results. On the SQL Editor toolbar, exactly 3 buttons over from the Execute

25

SQL SERVER STANDARD MAY JUNE 2007

You should also notice that there is also a Display Actual Execution Plan button on the same toolbar (exactly 4 buttons to the right of the Estimated button). The Estimated button parses the query and estimates an execution plan. The Actual button executes the query before the plan is generated. Once you have all of this information, it is up to you to decide what to do with it. Is the information you are getting back good? Based on the information returned from the execution plan, you could try re-writing portions of the query to try and get the operation execution costs down.

Summary
Query tuning can be difficult, but it doesnt have to be. Knowing what tools are available to you and how to properly use them can make the task of improving the performance of your queries much more enjoyable. The intent of this article was to show you exactly that, by discussing several of the tools and features included with SQL Server 2005 that you have at your disposal and how to understand the information these tools provide.

Picture 7 operator costs down, you will subsequently get the Estimated Subtree Cost down as well.

Scott Klein is an independent consultant with passions for all things SQL Server, .NET and XML. He is the author of Professional SQL Server 2005 XML and Professional WCF Programming: .NET Development with the Windows Communication Foundation, both by Wrox, writes the bi-weekly feature article for the SQL PASS Community Connector, and has contributed articles to both Wrox (www.Wrox.com) and TopXML (www.TopXML.com). He frequently speaks at SQL Server and .NET user groups. When he is not sitting in front of a computer or spending time with his family he can usually be found aboard his Yamaha at the local motocross track. He can be reached at ScottKlein@SqlXml.com

26

SQL CLR - HOW, WHY & WHY NOT


BY: Chuck Heinzelman

Ever since the announcement that the CLR was going to be integrated into SQL Server, Ive noticed two general reactions among database professionals. Those who would classify themselves as database administrators generally had a reaction along the lines of there is no way that Im going to let CLR code run on my server. Those who would classify themselves as developers generally responded by saying something like, now Ill never need to write T-SQL again! Of course both reactions are just that reactions. The reality of the situation lies somewhere in the middle. Database administrators should be willing to allow CLR-based code to run in their databases in certain controlled circumstances. Database developers should realize that T-SQL is still the king for accessing data, and CLR-based stored procedures and functions should be used in the situations where corresponding T-SQL functionality either doesnt exist or is too complicated to be written efficiently. In this article I will attempt to explain the what, whys and how - what is the CLR, why would you want to use it, why wouldnt you want to use it, and how do you use it.

referred to as managed. Youll sometimes hear the term managed code when someone is referring to a .NET application. You can still write unmanaged applications for Windows using languages such as C++ and Visual Basic 6.0. Managed and unmanaged code can even co-exist in the same application, but that is well beyond the scope of this article. The .NET platform provides a common type system that can be used across all .NET languages. Therefore, C# applications can call VB.NET class libraries without having to convert data types. You could if you wanted to even write a data access layer in COBOL that is used from a Python front end, provided you have the appropriate .NET compilers. This interoperability provides a major advantage, since you can leverage your resources where they can work best. If the majority of your system is going to be written in C#, you can still bring in a user interface developer for your web site who only knows how to develop in VB.NET and things will work perfectly together. Over the course of the article, I will delve back into the CLR realm where necessary to explain some of the key concepts that I am covering. I promise that Ill try not to get too .NET-ish.

What is the CLR?


By now Im pretty sure that most database professionals are familiar with the acronym CLR. But, does everyone really know what it is and what it does? While it is not my intention here to dig deep into the CLR, a little background information is required. CLR stands for Common Language Runtime. It is a technology that came out when Microsoft released the .NET Framework back in 2002. It is as the name implies a runtime environment. Applications written in .NET languages (such as VB.NET and C#) are compiled to an intermediate language (specifically, the Microsoft Intermediate Language). This intermediate language is then loaded by the Common Language Runtime and turned into platformspecific byte code. Therefore, .NET-based applications are theoretically portable to non-Microsoft operating systems that have ports of the Common Language Runtime such as the Mono project. Applications written to run within the CLR environment are

Why Should I use the CLR?


My general opinion is that CLR use in the database engine should be limited to only where it is needed. While everyone will have their own interpretation of what needed is, there are a few situations where there will be little ambiguity. So, where do I think that it is appropriate to use CLR-based code in the database? First of all, there are functions that naturally fall into database functionality that are not supported well or at all in T-SQL. In my opinion, regular expressions fall into this category. There are many occasions where I could have benefitted from including a regular expression in a WHERE clause, but up until now I just havent had that ability. The .NET Framework has an extensive set of classes for dealing with regular expressions. One possible use for CLR integration would be to write a series of user defined functions to expose the .NET regular expression functionality to SQL Server. As a matter of fact, the example that I will be

27

SQL SERVER STANDARD MAY JUNE 2007

using when I get into actually writing the CLR-based code in this article is a User Defined Function that evaluates regular expressions. Another area within SQL Server that can benefit from CLR integration is the type system. You have been able to add user-defined types to SQL Server for quite a while now, but you were limited as to what you could do with them. Now, with CLR integration, you can create rich user-defined types to help suit your business needs and greatly expand SQL Servers built-in type system. Yet another area of possible use that I would like to touch on is proprietary calculations. Lets say that you are working for a company that does financial analysis and has a proprietary calculation for determining someones credit risk. That calculation takes as input a number of different parameters and returns a rating from 1 to 5 on their risk. Since the calculation is so mission critical, the source code is kept under lock and key and the compiled versions of it are closely guarded. Up until now, the calculation has only been included in the compiled Windows-based applications that are used by the financial analysts. Members of senior management are asking for summary reports containing potential clients and their associated credit risk. Without CLR integration, the only way to get that risk score was to have the user interface code write it into the database as a static field. The problem with this approach is that a change in someones underlying information would require an extra step to ensure that the credit risk field was kept up-to-date. Now that the CLR has been integrated into SQL Server, the .NET code that performs the proprietary calculation can be integrated as a user-defined scalar-valued function. It could be called inline in a query, and the results can be used in a report that senior management can run at any time. Through CLR integration, you accomplish your goal without significantly expanding the risk of exposing your proprietary calculation any further than it has already been exposed through your applications.

Im not saying that there are never good reasons to embed business logic into the database tier I have personally done it many times. There are situations such as when you need to do a manual compare on two very large data sets that embedding that logic into the database will perform better than pulling those large data sets over the network, processing them, and then returning the results to the database. As a rule, I dont like the idea of embedding all of the business logic into the database. Having said that, there is a place where the CLR can help to clean up situations that you inherit. Say, for example, that you inherit a system where all of the business logic is embedded in the database in T-SQL stored procedures. When you approach your project manager for approval to extract all of this into a business layer, they deny the request due to the amount of rework that would be required to make use of the new business layer. Your counter-proposal could be for them to allow you to break the T-SQL stored procedures into CLRbased stored procedures. This would provide the following benefits: Procedural Logic Optimization Lets face it, while T-SQL allows you to perform procedural logic such as looping and flow control it isnt the best at it. That procedural logic can be moved into a CLR-based stored procedure that makes calls into smaller T-SQL based stored procedures to perform the set-based work. Additional Development Resources If you work in environments that are similar to the ones that I work in, there are many more people who know VB.NET or C# than know T-SQL or at least know it well enough to write complex stored procedures using it. By moving the business logic stored procedures into a CLR language, you are expanding the pool of people who can provide support and maintenance. Potential Future Reuse If written correctly, CLR-based stored procedures could be reworked into a dedicated business tier when the time comes. No Changes to Calling Applications To the world outside of SQL Server, a stored procedure is a stored procedure whether it is CLR-based or written in T-SQL. This means that calling applications do not need to change at all when you convert over from a T-SQL stored procedure to a CLR stored procedure.

Why Shouldnt I use the CLR?


In my opinion, the CLR should not be used as an excuse to avoid having true business and data layers in your application by simply embedding procedural logic into your database. I see this as being one of the biggest opportunities for misuse of CLR integration. Im sure that there are going to be stories coming out about how systems are implemented with their entire business logic coded in CLR-based stored procedures. This is really nothing new. Prior to the CLR being integrated into SQL Server you could write your entire business logic into stored procedures you just needed to write those procedures in T-SQL. CLR integration will make this much easier to do.

How do I use the CLR?


For this article, I am making the assumption that in addition to SQL Server 2005 you have Visual Studio 2005 installed. There are ways to build CLR applications and libraries without using Visual Studio but it is easier, for demonstration purposes, to do the work through Visual Studio. Also, all of the code will be in C#.

28

SQL SERVER STANDARD MAY JUNE 2007

The first thing that you need to know is that CLR integration is disabled by default. This, I think, will cause more support calls from developers trying to implement CLR-based stored procedures than anything else. To turn on CLR integration, run the script in Figure 1. Alternately, you can turn off CLR integration by running the script in Figure 2.
USE MASTER GO sp_configure show advanced options, 1; GO RECONFIGURE; GO sp_configure clr enabled, 1; GO RECONFIGURE; GO

smalldatetime datetime sql_variant User-defined type(UDT) user-

SqlDateTime SqlDateTime None None

DateTime DateTime Object Same class that is bound to the defined type in the same assembly or a dependent assembly. None None None None

table cursor timestamp xml

None None None SqlXml

Figure 3 SQL Server and CLR Data Types Last, but not least, lets get into the example. Regular expressions can be extremely handy when performing string comparisons. They make it easy to determine if a string is a valid e-mail address, web address, or postal code, just to name a few of the comparisons that can be done. Regular expressions have been around for years, but there isnt any direct support for them within SQL Server. The .NET Framework, however, has a great set of classes for working with regular expressions. This example will show you how to create a userdefined function that will compare an input string to a regular expression and will tell you whether or not the string matches the expression. I would suggest creating a new database to use for these examples. All of the example code will reference a database that I have created called SQLServerStandard. To get started, launch Visual Studio 2005. Create a new project by selecting File -> New -> Project You should be presented with a dialog similar to that shown in Figure 4. The dialog will show different options, depending on what languages and features you have installed.

Figure 1 Script to turn on CLR integration


USE MASTER GO sp_configure show advanced options, 1; GO RECONFIGURE; GO sp_configure clr enabled, 0; GO RECONFIGURE; GO

Figure 2 Script to turn off CLR integration The next thing that you will need to know is that SQL Server data types are mapped to specific CLR data types. The table in Figure 3 (taken from SQL Server Books Online), lists the SQL Server data types and their corresponding CLR data types. When passing data into and out of CLR-based procedures, you should use the types in the column CLR data type (SQL Server). When passing data around within the CLRbased code, you should convert them to the types in the column CLR data type (.NET Framework).
SQL Server data type varbinary binary varbinary(1), binary(1) image varchar char nvarchar(1), nchar(1) nvarchar CLR data type (SQL Server) SqlBytes, SqlBinary SqlBytes, SqlBinary SqlBytes, SqlBinary None None None SqlChars, SqlString SqlChars, SqlString SQLChars is a better match for data transfer and access, and SQLString is a better match for performing String operations. SqlChars, SqlString None None SqlGuid None SqlBoolean SqlByte SqlInt16 SqlInt32 SqlInt64 SqlMoney SqlMoney SqlDecimal SqlDecimal SqlSingle SqlDouble CLR data type (.NET Framework) Byte[] Byte[] byte, Byte[] None None None Char, String, Char[]

nchar text ntext uniqueidentifier rowversion bit tinyint smallint int bigint smallmoney money numeric decimal real float

String, Char[] String, Char[] None None Guid Byte[] Boolean Byte Int16 Int32 Int64 Decimal Decimal Decimal Decimal Single Double

Figure 4 Visual Studio New Project Dialog In the Project types: tree, expand the tree for Visual C# and click Database. In the Templates: list, click SQL Server Project. For this demonstration, please enter the following information into the appropriate areas of the form: Name: RegExUDF Location: C:\temp\SQLServerStandard Solution Name: RegExUDF

29

SQL SERVER STANDARD MAY JUNE 2007

Also, please check the Create directory for solution checkbox and uncheck the Add to Source Control checkbox (unless you want to put this demo code into source control, that is). Once you have entered all of the information, click OK. You may be prompted with a dialog asking if you want to enable SQL/CLR debugging. For the examples in this article, you can answer no. After you click OK, Visual Studio will create a new Database project. You will be prompted to either select or create a database reference. If you have created a reference to your target database in the past (through earlier use of Visual Studio to manage databases), you can select it from the list. Otherwise, click the Add New Reference button. You will be presented with a dialog similar to the one shown in Figure 5. Figure 5 New Database Reference Dialog Enter (or select) the name of the server where your database resides, select your desired authentication mode (and enter the user name and password, if necessary), select the target database, and click OK. You should be returned to the Add Database Reference dialog, where the reference that you just created should be shown in the list. Select the reference and click OK. When the solution opens, you should see a Solution Explorer window that looks similar to that shown in Figure 6. If you do not see a Solution Explorer window, you can display it by selecting View -> Solution Explorer. Figure 6 Solution Explorer Window The next thing that we need to do is to actually add the CLRbased User Defined Function to the project. To do this, rightclick on the project name in the Solution Explorer window and choose Add -> User-Defined Function from the context menu. You should be presented with a dialog similar to the one shown in Figure 7. Ensure that User-Defined

Function is selected in the Templates: list, enter RegEx.cs in the Name: text box, and click Add.

Figure 7 Add New Item Dialog A code window similar to that shown in Figure 8 should be opened although the color and font scheme will most likely differ from the one shown here!

Figure 8 Initial RegEx.cs Code Window Now I need to take a few minutes and dive into the CLR to explain what is going on in the code above. The first thing to note is that C# is a case-sensitive language. Therefore SqlString is not the same as sqlstring. If we were writing this code in Visual Basic.NET, SqlString and sqlstring would be the same, although the editor would most likely make them the same case in the code to make things more readable. The next thing to point out is the structure of the code. In the code above there is a single class, and that class contains a single method. The class is named UserDefinedFunctions, but that name could be almost anything you want it to be. Note that the name of the class and the name of the code file do not need to match. The class has two modifiers public and partial. Public indicates that there are no inherent restrictions on who can access this class (restrictions could be imposed by security privileges outside of the class, but that is

30

SQL SERVER STANDARD MAY JUNE 2007

The new version of the method is still declared as public and static, but the return type has been changed to SqlBoolan which will turn into a bit in SQL Server. Unlike the original version, which did not accept any parameters, the new version The single method in the class is called RegEx. It has three accepts two parameters of type SqlString which translates modifiers associated with it public, static and SqlString. into a Unicode character type in SQL Server (nvarchar or Public indicates that the method can be called by anyone nchar). The first parameter (SearchString) is the string that who has access to the class. Static indiyou want to compare to a regular cates that the class does not need to be expression. The second parameter My general opinion is that CLR use instantiated (turned into an object) to (Expression) is the actual regular expresin the database engine should be call the method. SqlString indicates the sion to be applied in the comparison. limited to only where it is needed return type of the method. Since the regular expression syntax is everyone will have their own quite involved, Im not going to dig into it interpretation of what needed is. The text in the square brackets before in any depth. There are many resources the name of the method is called an on the Internet that can help you build attribute. Attributes are used as a way to regular expressions. The one that I am using in my example extend the functionality provided by the .NET Framework. You can think of attributes as meta-data for the methods in your actually comes from the MSDN library. beyond the scope of this article). Partial indicates to the complier that the class can be split among multiple files but it does not need to be. code. If all of this seems daunting to you, dont worry. If there is something that is required for your code to work within SQL Server, it will be included in the Visual Studio template. You might need to extend what is provided in the template to meet your needs, but the basics provided in the template should get you going. If we were to build, deploy and call the code as it stands right now, calling the function would return the text Hello to the calling query. We want the function to do a little more than that. Replace the entire RegEx method with the code in Figure 9. When you are done, you should have a code window similar to that shown in Figure 10.
public static SqlBoolean RegEx(SqlString SearchString, SqlString Expression) { return new SqlBoolean(System.Text.RegularExpressions.Regex.IsMatch(SearchSt ring.ToString(), Expression.ToString())); }

The code within the method is actually only one line of C# code even though it is broken up over 4 lines in the editor window shown in Figure 10. The semicolon not a carriage return or line feed marks the end of a line of code in C#. This following pseudocode describes what is done in this line of code: Compare the search string to the expression If there is a match, indicate true; otherwise, indicate false Convert the .NET bool to a new SqlBoolean Return the SqlBoolean One thing to note is that we need to convert the SQL Server CLR Data Types to their corresponding .NET CLR Data Types in order to work with them, and then convert back to the SQL Server CLR Data Types to be passed back to SQL Server. That is why we need to call the ToString() method when we use the input parameters (which are of type SqlString). ToString() converts the variables of type SqlString into their corresponding string type. On the way back out, we are taking the bool output from the IsMatch method and converting it into a new SqlBoolean. Now that we have the code, it needs to be built and deployed before we can actually use it. To build code in Visual Studio, you first need to select the appropriate build mode either Debug or Release from the dropdown list in the toolbar. Unless you are planning to step through the code with a debugger, I would recommend building the code in release mode. Next, select Build -> Build XXX, where XXX is the name of your project (RegExUDF in our case). If there are build errors, the error list will be displayed. If the build is successful, you should see Build succeeded on the left side of the Visual Studio status bar. I am going to show you two methods for deploying your code to SQL Server through T-SQL scripts and through Visual Studio itself.

Figure 9 RegEx Method Code

Figure 10 RegEx Code Window After Modification

31

SQL SERVER STANDARD MAY JUNE 2007

When deploying through script, there are two things that you need to do create the assembly then create the function. An assembly is the .NET term for a compiled unit of code whether it is a Dynamic Link Library (dll) or an Executable (exe). The assemblies that we will be registering with SQL Server will be dlls. To create the assembly, execute the script in Figure 11 against your target database. Remember to change the path if you used a different target path than the path outlined in this example.
CREATE ASSEMBLY RegExUDF FROM C:\temp\SQLServerStandard\RegExUDF\RegExUDF\bin\Release\Re gExUDF.dll WITH PERMISSION_SET = SAFE;

through Visual Studio. To deploy through Visual Studio, select Build -> Deploy XXX, where XXX is the name of the project you are deploying. The deployment handles both creating the assembly and creating the function. You can set the PERMISSION_SET as a property of the project. As with the scripting option, it will default to SAFE. Now that the function is deployed, you can use it just like you would use any other UDF within SQL Server. The code shown in Figure 13 will execute the function twice. In each case, the expression is looking for a valid e-mail address. The first select statement will pass a valid e-mail address as the search string, and the second will pass an invalid e-mail address as the search string. The output is shown in Figure 14.
SELECT dbo.RegEx(chuck_heinzelman@sqlpass.org, ^([\w\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zAZ]{2,4}|[0-9]{1,3})(\]?)$); GO SELECT dbo.RegEx(chuck_heinzelman.sqlpass.org, ^([\w\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zAZ]{2,4}|[0-9]{1,3})(\]?)$); GO

FIgure 11 Code to Create Assembly The Create Assembly call takes the name of the assembly, the path to the .dll, and a PERMISSION_SET. The PERMISSION_SET is extremely important, as it defines what access the code within the assembly has when run. The default mode is SAFE which is the most restrictive method of access. SAFE assemblies cannot access resources outside of SQL Server. There are two other modes that are available. EXTERNAL_ACCESS means that the code can access external resources on the server, such as the file system and the registry. UNSAFE is the least restrictive PERMISSION_SET, and the code is virtually unrestricted. As a rule, you should use the least open PERMISSION_SET when you register an assembly. In other words, if your assembly doesnt require access to resources outside of SQL Server, mark it as SAFE, not EXTERNAL_ACCESS. The next step in deploying CLR-based code through script is to create the object a function in our case. To create the function, execute the script in Figure 12 against your target database.
CREATE FUNCTION RegEx (@SearchString as nvarchar(255), @Expression as nvarchar(255)) RETURNS bit AS EXTERNAL NAME RegExUDF.UserDefinedFunctions.RegEx;

Figure 13 UDF Execution Code

Figure 14 UDF Execution Output

Conclusion
CLR integration in SQL Server has caused both excitement and fear depending on your point of view. In my opinion, the excitement is warranted and the fear is unfounded. Given the fact that code can be deployed with extremely restrictive rights, there is actually little chance that something could go drastically wrong. The example that we have gone through here is one of the easiest ways to use the CLR in SQL Server. You can also create custom aggregations and user defined types, which in my opinion are the most difficult CLR objects to create. You can download the code used in this example at http://www.sqlserverstandard.com/downloads/200705/heinzelman.zip.

Figure 12 Code to Create Function As with a standard T-SQL function, the Create Function call takes the name of the function, any parameters required, a return type, and the function body. In our case, the function body refers to the method in the assembly that we registered. The name RegExUDF.UserDefinedFunctions.RegEx is the full name of the RegEx method that we created. There is, as I mentioned before, a way to deploy this function

32

PASS
The PASS Editorial Committee would like to welcome you to a new regular column for the SQL Server Standard magazine. With DBA 101, we will attempt to offer an introductory look at the editorial focus for each issue. If you have any comments, please send an e-mail to editorial@sqlpass.org.

DBA 101 PERFORMANCE TUNING 101

When you are just starting out in the database world, performance tuning seems like a daunting task. There are so many places where you might need to go to tune something. Should I look at the table structures, storage, memory, indexes, or even the query itself? The point of this article is not to make you a tuning expert, nor is it meant to represent the perfect way to do things. My goal is to give you some ideas on where to start whether you are a seasoned DBA or someone just getting started.

Setting Performance Goals


Before you can attempt to tune a system, you need to have a desired performance target in place. If you dont have a welldefined target youll be tuning forever. These targets are usually expressed as a desired response time to an interactive application or a desired throughput rate for batch processing. The following are a few examples of performance targets:

The application must have a turn-around time of 300 milliseconds from the time that the user submits a request for data to the time that data is returned. The batch processing must be able to handle sustained throughput of 10,000 records per hour, with periodic bursts of up to 100,000 records per hour.

An important thing to derive from performance targets is what the real expectations are. In the first example, the requirement is that the data is returned in 300 milliseconds, but it doesnt mention if that is the first piece of data or the last. In the second example, the goal states that the system must be able to handle bursts of up to 100,000 records per hour, but it does not state how frequent those bursts are or if there can be some lag time in the processing of those bursts. Often you will find that performance goals need some additional clarification to make sure that they are met correctly. Dont be afraid to ask for clarification if a goal seems ambiguous to you. And, if the goals dont exist, ask the appropriate people for help in defining what those goals should be.

Designing for Performance


If you are fortunate enough to be designing and building a new system, there are a few tricks that you can have up your sleeve to help meet the stated performance goals. When designing a new application, you can write your queries in such a way as to optimize index use. You can look at different ways to access the data that will help to meet the stated performance goals. SQL Server provides many different ways to design for performance. Some of those methods are: Indexed Views Indexed views provide materialization of the data that underlies a view. Unlike a traditional view where the indexes on the underlying tables must be used for performance indexed views can have their own indexes to meet querying needs. Multiple Filegroups Table data and indexes can be spread across multiple filegroups, and those filegroups can be put on multiple physical volumes. When spreading data across multiple volumes, multiple pieces of information can be fetched from disk at the same time, thereby reducing I/O related bottlenecks. Partitioned Tables In addition to splitting whole tables up among filegroups, you can also split individual tables across multiple filegroups, which can in turn be placed on multiple physical volumes. Partitioning can be beneficial in situations where you have multiple ranges of information that live concurrently, or when you have ranges of data that will need to be dropped from the database periodically.

mum, you should put your log files on a separate volume from your data files.

Monitoring
Once a system is up and running, you should monitor its performance periodically to make sure that things are running optimally. To make this an effective venture, you should have a performance baseline to work against. To get this baseline, you should monitor the servers and your applications when they are running optimally. Once you have the baseline, you can perform periodic server monitoring to look for deviations from that baseline. If you have access to the archived sessions from prior PASS conferences, our president Kevin Kline has given a great session on baselining that is worth checking out.

When it comes to tools for monitoring performance, I use the tools that are available right out of the box Windows Performance Monitor and SQL Server Profiler. There are also some DBCC commands that can be very useful such as DBCC SHOWCONTIG for checking index fragmentation. I personally tend to monitor disk utilization, memory and processor utilization, and a subset of the SQL Server performance counters from within PerfMon. I also look at query execution times and I/O statistics from within Profiler. When I see things that dont look normal an abnormally high number of reads for a given query, for example I will dig deeper into the root cause of the problem.

Tuning
Once you have monitored the situation and have realized that you are not meeting your performance goals, you can begin to tune. When many people think of tuning, they automatically jump to indexes. This is often a good leap to make. However, there are other things that need to be investigated. You might have appropriate indexes on all of your tables, but they are not being used because of the way that the query is writ-

Many of these techniques can be implemented even if you are not designing an application from the ground up. You can take an existing or off-the-shelf application and spread objects across multiple filegroups on multiple volumes in an attempt to improve performance. At a mini-

ten. If this is the case, you could rewrite the query in a different way so that the optimizer will choose an existing index. In some cases, the optimizer will choose a bad query plan because your statistics are out of data. Periodically updating the statistics will help to solve this problem. Also, indexes that are overly fragmented can cause physical I/O to be slow.

Beyond index tuning and query rewrites, you can look at the performance of your physical hardware. Does your server have enough memory to keep data and query plans cached for a sufficient period of time? Are you over-taxing one of your drive arrays? Are you having networking issues? All of these are possible causes of poor system performance.

Conclusion
One thing that you need to remember is that in many cases perception is everything. Your server could be performing perfectly, but other factors such as network difficulties or poorly written application code could be the real problem. Many times, the problem will come down to several small issues that combine to make one bigger issue. When you need to deal with other departments on performance-related issues such as application development and network support make sure that you take your tact and your facts with you. When approaching others about performance issues, people can sometimes get defensive. My overall advice is to get all of your information straight before approaching other departments and to approach them in a constructive manner. That way, you stand the best chance of improving the overall situation.

401 North Michigan Avenue Chicago, Illinois 60611- 4267 Phone: 312.527.6742 Fax: 312.245.1081

E-mail: passhq@sqlpass.org Web: www.sqlpass.org

Attend the LARGEST SQL Server Educational Event in 2007!

Microsoft SQL Server Users Conference & Expo


September 18 21, 2007 Colorado Convention Center Denver, Colorado
The most impressive collection of product experts, technical sessions and networking opportunities all under one roof makes the PASS Community Summit a must-attend event for DBAs, architects and developers who depend on SQL Server. 100+ Educational Sessions Choose from a variety of educational sessions presented by members of the Microsoft SQL Server development team, MVPs, recognized experts in the industry and users like you! Sessions are divided into four tracks, including: Database Application Development Enterprise Database Administration and Deployment Data Warehousing and Business Intelligence Professional Development

2007 PASS Community Summit

Attend the only SQL Server event hosted for users, by users, resulting in unparalleled education to help you succeed in your career. For more information, visit: http://www.sqlpass.org/

USERS
E D U C AT I O N

WORKING
|

TOGETHER
| COMMUNITY

NETWORKING

w w w . s q l p a s s . o r g

You might also like