You are on page 1of 12

Case Study Automate Deadlock Notification

ABOUT THE ENGAGEMENT


The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications. BUSINESS NEED
Application Process are killed with information.

APPROACH
Automate the deadlock information report from error log Report to be generated as soon as deadlock occurs on the SQL

Killed with or without information ? What processes are these and what are they used for ?

Server

What automation was done ?

CHALLENGES
Application Team raise a ticket with Database team. There is a considerable delay in raising a ticket. Database Team takes 24 hour SLA to send the report. Database Team collects the data manually from the error log and

formats the data and sends to app team. Time Taken for Database Team per deadlock 45 minutes.

BUSINESS VALUE DELIVERED


Turn Around Time for Application Team reduced to 0-5 min {from

SCOPE
To reduce the Non-Value Add work from Database Team To reduce the turn-around time for getting information to the app

team

?} Non-Value Work for DB Team removed. How much time saved ? Number of Alerts reduced to Zero for deadlocks {from ?} Number of tickets reduced to Zero for deadlock information {from ?} Risk Reduction DB Team do NOT need to log into the server to get this information.

2011 Copyright Genpact. All Rights Reserved.

Case Study Automate Deadlock Notification


ABOUT THE ENGAGEMENT
The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications.

BUSINESS NEED
Windows Clusters Servers are patched every month. Services

APPROACH
Conf Call Setup During patching window to ensure the resources

have to failed back to their resident node after patching.

CHALLENGES
Compute Teams patching 100s of servers at the same time No Alerting Mechanism if servers are on wrong node Outages due to resource NOT on Resident Node High Number of Memory/CPU Alerts due to resource NOT on

are on resident node between database and compute teams Compute Team has been provided a list of resources and their resident nodes as checklist. Database Team has alert setup for resources NOT on their resident nodes Auto-Reboot and Auto-Update features disabled to ensure all clusters on same patch level

resident node Auto Reboot and Auto Update Feature enabled on several servers leading to inconsistency between patch levels

BUSINESS VALUE DELIVERED


Non-Value Work for DB Team removed. - How much time saved

SCOPE
To reduce the Non-Value Add work from Database Team To reduce the outages due to resources not on resident node To reduce the number of alerts due to resources not on resident

? Number of Alerts reduced to Zero for CPU and Memory - {from ?} Number of Outages reduced to Zero due to resource NOT on resident node and Auto Reboots due to Auto Updates - {from ?}

node

2011 Copyright Genpact. All Rights Reserved.

Case Study ASM Scan Order change


ABOUT THE ENGAGEMENT The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications. BUSINESS NEED APPROACH

ASM does NOT use Multipath or powerpath pseduo devices Bring all the instances down on one node including ASM instance instead uses single devices and accounts to single point of failure in Perform the Scan order changes in /etc/sysconfig/oracleasm RAC environments and cause IO Failure during the storage device Start and stop ASMLIB failover Start the instances. Repeat the process on all other nodes of the Cluster CHALLENGES
Application Team Sign-Off and the activity to be done with zero

downtime. Databases on different operating systems. BUSINESS VALUE DELIVERED


No. of I/O Failures reduced to zero from 4-5 failures per month. No. of Outages reduced to zero due to I/O issues related to

Multipath/powerpath issues. SCOPE


ASM Scan order change in /etc/sysconfig/oracleasm file. Ensure ASM uses Multipath/powerpath after change Ensure storage works fine without any I/O errors after change

2011 Copyright Genpact. All Rights Reserved.

Case Study Automation of Monthly Database Recycle


ABOUT THE ENGAGEMENT The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications. BUSINESS NEED Every Month the database has to be recycled with prior and post information to Application Teams as per scheduled and approved Maintenance Window CHALLENGES
Script to run based on the day, date and time

APPROACH
Script to verify the day, date and time of server and match with

approved time already coded. Script to send notification 2 days prior to the activity Script to send notification 30 mins prior to the activity Script to Recycle node by node. Script to send notification once the activity is complete

Database has to be recycled Node by Node with no downtime and at any point in time, one node will be down only during this activity other available nodes have to be up and running BUSINESS VALUE DELIVERED SCOPE
The databases to be recycled one node at a time Email Notifications to be sent to App and Database Teams prior No Manual Intervention Needed Saved Man Hours: 4 hrs * 10 = 40 hrs per month (4 hrs includes

and post the activity is performed.

activity or recycle, notification and also getting approval) Reduced to Zero Human Errors

2011 Copyright Genpact. All Rights Reserved.

Case Study DBMS-Stats Project


ABOUT THE ENGAGEMENT The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications. BUSINESS NEED Oracle 10g databases were using process used in 9i version databases for gathering the stats using analyze method. The need is to implement DBMS_STATS on all the non default schemas and removal of current analyze jobs. CHALLENGES App Team reluctant to change the stats collection jobs to new method because they dont want to changes in current infra. Removal of current jobs and schedule new ones. APPROACH
Verify the Analyze job details and configuration Delete the existing Statistics collected by Analyze Job Gather Statistics using DBMS Stats scripts as per

recommendation of 10g BUSINESS VALUE DELIVERED


20 % reduction in the number of outages due to performance

issues. Overall Better Database Performance in over more then 40 apps. App Team acknowledged Better database performance

SCOPE
Implement the new stats collection job in all Oracle 10G

Instances Remove Existing Analyze Jobs

2011 Copyright Genpact. All Rights Reserved.

Case Study Live Re-Org


ABOUT THE ENGAGEMENT
The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications.

BUSINESS NEED

APPROACH

Database Tables with Huge amount of data and has frequent Identify the tables and its dependents which are highly updates and deletes expected to get fragmented. There is a need to fragmented the database views dba_segments and De-fragment the tables which will improve the performance and dba_tables.avg_row_len also release space Use QSM ( Quest Space Manager ) and reorg the tables to release the space

CHALLENGES
Need to perform the activity online when the application actively

BUSINESS VALUE DELIVERED


Reclaimed 40% of space each time reorg is performed Performed online reorgs with zero downtime to application and

using the tables Reduce the amount of Archives ( Change Logs ) that gets generated during Re-org of table Improve the Read performance of the tables

database Tracking historical reorganization activity for proactive analysis and planning.

SCOPE
Defragment the Tables Improve the Read Performance and reduce IO

2011 Copyright Genpact. All Rights Reserved.

Case Study 9i to 11g upgrade with Unicode conversion


ABOUT THE ENGAGEMENT
The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications.

BUSINESS NEED

APPROACH

SAP Application upgrade which demanded database upgrade from Upgraded database from 9i on AIX to 10g on AIX 9i to 11g with Unicode conversion(Unicode conversion will support Used recover point for database restoration from 9i to 10g multiple character set).9i database was on AIX but business Split the large tables demanded 11g database to be in Solaris Performed Export for large tables first and other tables later Created new 11g database on Solaris server CHALLENGES Performed Import of tables along with index creation.We had Perform an Upgrade from 9i AIX to 11g on Solaris automated export/import and index creation. Database Size 8TB Maintenance Window Provided = 30 hours BUSINESS VALUE DELIVERED Need to move Data, Indexes in the maintenance Window Need to perform Unicode conversion in the window Database upgrade to 11g resulted in high performance gain and different character set support. Database performance improved by 20% because queries are performing well and reports are completed faster.

SCOPE
Database Upgrade from oracle 9i to 11g To support application upgrade with Unicode conversion Change in Operating System

2011 Copyright Genpact. All Rights Reserved.

Case Study Splitting the Partition in 10g


ABOUT THE ENGAGEMENT The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications. BUSINESS NEED
Table XYZ of size 700G is a partitioned table with two partitions

APPROACH
Created the year wise test tables and deleted the same data from

50G and 650G The Bigger partition takes 4+ hours for collecting stats which was resulting in outage. High load observed on the server due to stats collection on this table which was causing application to work slow and was causing outage. CHALLENGES
Split partition is very expensive and time consuming task Managing space in Data tablespace, Temporary tablespace and

actual table. Split the bigger partition for the year whose data was deleted resulting in split partition execution to be faster due to no data in the actual table. Exchanged the test table data with the new partition. Rebuild the indexes. All the steps performed through automated scripts.

Undo tablespace Very slow performance due to huge data and lower server configuration

BUSINESS VALUE DELIVERED


Stats collection Time reduced to 1 hour from 4-5 hours. No. of Outages reduced to Zero 2-3 outages monthly. 80% of stats collection time reduced.Earlier it was 4-5 hours and

SCOPE
Split partition to reduce the partition size.We did it in parallel with

now it is 1 hour.

NOLOGGING to do it faster. Gain in query performance Reduce Stats Collection Time on the server

2011 Copyright Genpact. All Rights Reserved.

Case Study Table Compression


ABOUT THE ENGAGEMENT
The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications.

BUSINESS NEED

APPROACH

Databases tables grow in size over the period of time with archival Identify the tables which are Read Mostly data. There is a need to reduce the size of the databases, tables Compress the tables using alter table.. Compress option to and in-turn reduce the backup size which will improve the compress the data at the database block level manageability of the database.

CHALLENGES
Indentify the tables which are Read-Mostly and can be

compressed Updates / Inserts operations on the table after the compression have no affect of compression ( Direct Load is exception ) Sign off from Application Teams

BUSINESS VALUE DELIVERED


Up to 3x disk savings Faster full Scan / Range Scan Operations Reduced Network traffic because oracle compresses and

decompresses the data so the network packets will be significantly smaller Upto 3x Reduction in Backup Size

SCOPE
Release space from the Mostly Read Only tables Improve the IO performance

2011 Copyright Genpact. All Rights Reserved.

Case Study Tempdb Threshold Change


ABOUT THE ENGAGEMENT
The client is a media and entertainment giant engaged in the production and marketing of entertainment, news, and information products and services to a global customer base. The IT backbone supporting the organization is pretty complex network and a few thousand servers spread across multiple platforms ranging from legacy to open source. There are around ~1000 databases which are currently in use for various applications.

10

BUSINESS NEED
Huge Number of Alerts on Tempdb Log Space Usage. Any data

APPROACH
Identify the number of servers and analyze the pattern Researched Vendor Site and identified that Tempdb Log File

points on how many alerts ?

gets truncated automatically by SQL Server itself Increased the alert threshold on tempdb from 65% to 75% What did this do ?

CHALLENGES
Analyze the huge number of alerts and identify the root cause

BUSINESS VALUE DELIVERED


Non-Value Work for DB Team removed. How much time saved

SCOPE
To reduce the number of alerts on tempdb log space on servers

? Number of Alerts reduced by 95% for Tempdb Log File Usage {from ?}

2011 Copyright Genpact. All Rights Reserved.

Back-up Mechanism Revamp of Sybase Servers


Current Backup Mechanism
Backup Mechanism Used : Compression Stripes : : Sybase Uncompressed Backups Gzip the files 13 files for each database 1 job for each database ( e.g : 10 jobs for 10 app databases)

e ore A ter

Number of jobs per server:

Root Cause Analysis


Job does not have a proper error identification mechanism No notification of Job success or failure

Log files are over written each day

Problem Y
Time utilized for backing up is huge (e.g. 65 GB database takes up to 4 hours to backup) Total # of Ctrl-M Jobs for 5 servers - 136

 

# obs

er ers

tri es

ime a en

 

Problem Y

Inconsistent Back-up mechanism Multiple jobs running for Back-up High utilization of system resources No tracking mechanism

1 job for clearing old backup files

1 job for all system databases

mins

Solution Single Globalized Back-up Script


X Solution Leveraged from Genpact DB Team X Single Globalized Back-up Script Reduces Complexity & Utilizes Robust Back-up

process to Reduce system resource utilization X Single Script to Alert, Rotate, Compress & Purge (as per NBCU Log Retention Policy) X Effective Utilization of time to run the back-ups X Automated Dashboard (Capacity Vs Utilization) & Trend Analysis of Logs Growth rate

Before
X Sybase Uncompressed Backup X X X X

After
X Sybase Native Compressed Backups X X X X

Mechanism used Gzips the files 13 files ( Stripes ) for each database Total of Ctrl-M Jobs for 5 servers 136 Time taken is very huge (for e.g. 65 GB database takes up to 4 hours for complete backup) Time taken to backup one server is approx 14+ hours

Mechanism used 6 files ( Stripes ) for each database Number of Jobs per server : 1 Total of Ctrl-M jobs for 5 servers - 5 Advantages :
 Email Alert with the Job Status  Job has error mechanism to report the error returned  Time taken is less ( for e.g 65 Gb database takes about 30 min to backup)  Log files are not over written and are kept for 20 days, after which they are deleted by the script  Time taken for each database backup is recorded

Consistent and Robust Back-up Script Saving on Time and Resources

You might also like