Professional Documents
Culture Documents
APPROACH
Automate the deadlock information report from error log Report to be generated as soon as deadlock occurs on the SQL
Killed with or without information ? What processes are these and what are they used for ?
Server
CHALLENGES
Application Team raise a ticket with Database team. There is a considerable delay in raising a ticket. Database Team takes 24 hour SLA to send the report. Database Team collects the data manually from the error log and
formats the data and sends to app team. Time Taken for Database Team per deadlock 45 minutes.
SCOPE
To reduce the Non-Value Add work from Database Team To reduce the turn-around time for getting information to the app
team
?} Non-Value Work for DB Team removed. How much time saved ? Number of Alerts reduced to Zero for deadlocks {from ?} Number of tickets reduced to Zero for deadlock information {from ?} Risk Reduction DB Team do NOT need to log into the server to get this information.
BUSINESS NEED
Windows Clusters Servers are patched every month. Services
APPROACH
Conf Call Setup During patching window to ensure the resources
CHALLENGES
Compute Teams patching 100s of servers at the same time No Alerting Mechanism if servers are on wrong node Outages due to resource NOT on Resident Node High Number of Memory/CPU Alerts due to resource NOT on
are on resident node between database and compute teams Compute Team has been provided a list of resources and their resident nodes as checklist. Database Team has alert setup for resources NOT on their resident nodes Auto-Reboot and Auto-Update features disabled to ensure all clusters on same patch level
resident node Auto Reboot and Auto Update Feature enabled on several servers leading to inconsistency between patch levels
SCOPE
To reduce the Non-Value Add work from Database Team To reduce the outages due to resources not on resident node To reduce the number of alerts due to resources not on resident
? Number of Alerts reduced to Zero for CPU and Memory - {from ?} Number of Outages reduced to Zero due to resource NOT on resident node and Auto Reboots due to Auto Updates - {from ?}
node
ASM does NOT use Multipath or powerpath pseduo devices Bring all the instances down on one node including ASM instance instead uses single devices and accounts to single point of failure in Perform the Scan order changes in /etc/sysconfig/oracleasm RAC environments and cause IO Failure during the storage device Start and stop ASMLIB failover Start the instances. Repeat the process on all other nodes of the Cluster CHALLENGES
Application Team Sign-Off and the activity to be done with zero
APPROACH
Script to verify the day, date and time of server and match with
approved time already coded. Script to send notification 2 days prior to the activity Script to send notification 30 mins prior to the activity Script to Recycle node by node. Script to send notification once the activity is complete
Database has to be recycled Node by Node with no downtime and at any point in time, one node will be down only during this activity other available nodes have to be up and running BUSINESS VALUE DELIVERED SCOPE
The databases to be recycled one node at a time Email Notifications to be sent to App and Database Teams prior No Manual Intervention Needed Saved Man Hours: 4 hrs * 10 = 40 hrs per month (4 hrs includes
activity or recycle, notification and also getting approval) Reduced to Zero Human Errors
issues. Overall Better Database Performance in over more then 40 apps. App Team acknowledged Better database performance
SCOPE
Implement the new stats collection job in all Oracle 10G
BUSINESS NEED
APPROACH
Database Tables with Huge amount of data and has frequent Identify the tables and its dependents which are highly updates and deletes expected to get fragmented. There is a need to fragmented the database views dba_segments and De-fragment the tables which will improve the performance and dba_tables.avg_row_len also release space Use QSM ( Quest Space Manager ) and reorg the tables to release the space
CHALLENGES
Need to perform the activity online when the application actively
using the tables Reduce the amount of Archives ( Change Logs ) that gets generated during Re-org of table Improve the Read performance of the tables
database Tracking historical reorganization activity for proactive analysis and planning.
SCOPE
Defragment the Tables Improve the Read Performance and reduce IO
BUSINESS NEED
APPROACH
SAP Application upgrade which demanded database upgrade from Upgraded database from 9i on AIX to 10g on AIX 9i to 11g with Unicode conversion(Unicode conversion will support Used recover point for database restoration from 9i to 10g multiple character set).9i database was on AIX but business Split the large tables demanded 11g database to be in Solaris Performed Export for large tables first and other tables later Created new 11g database on Solaris server CHALLENGES Performed Import of tables along with index creation.We had Perform an Upgrade from 9i AIX to 11g on Solaris automated export/import and index creation. Database Size 8TB Maintenance Window Provided = 30 hours BUSINESS VALUE DELIVERED Need to move Data, Indexes in the maintenance Window Need to perform Unicode conversion in the window Database upgrade to 11g resulted in high performance gain and different character set support. Database performance improved by 20% because queries are performing well and reports are completed faster.
SCOPE
Database Upgrade from oracle 9i to 11g To support application upgrade with Unicode conversion Change in Operating System
APPROACH
Created the year wise test tables and deleted the same data from
50G and 650G The Bigger partition takes 4+ hours for collecting stats which was resulting in outage. High load observed on the server due to stats collection on this table which was causing application to work slow and was causing outage. CHALLENGES
Split partition is very expensive and time consuming task Managing space in Data tablespace, Temporary tablespace and
actual table. Split the bigger partition for the year whose data was deleted resulting in split partition execution to be faster due to no data in the actual table. Exchanged the test table data with the new partition. Rebuild the indexes. All the steps performed through automated scripts.
Undo tablespace Very slow performance due to huge data and lower server configuration
SCOPE
Split partition to reduce the partition size.We did it in parallel with
now it is 1 hour.
NOLOGGING to do it faster. Gain in query performance Reduce Stats Collection Time on the server
BUSINESS NEED
APPROACH
Databases tables grow in size over the period of time with archival Identify the tables which are Read Mostly data. There is a need to reduce the size of the databases, tables Compress the tables using alter table.. Compress option to and in-turn reduce the backup size which will improve the compress the data at the database block level manageability of the database.
CHALLENGES
Indentify the tables which are Read-Mostly and can be
compressed Updates / Inserts operations on the table after the compression have no affect of compression ( Direct Load is exception ) Sign off from Application Teams
decompresses the data so the network packets will be significantly smaller Upto 3x Reduction in Backup Size
SCOPE
Release space from the Mostly Read Only tables Improve the IO performance
10
BUSINESS NEED
Huge Number of Alerts on Tempdb Log Space Usage. Any data
APPROACH
Identify the number of servers and analyze the pattern Researched Vendor Site and identified that Tempdb Log File
gets truncated automatically by SQL Server itself Increased the alert threshold on tempdb from 65% to 75% What did this do ?
CHALLENGES
Analyze the huge number of alerts and identify the root cause
SCOPE
To reduce the number of alerts on tempdb log space on servers
? Number of Alerts reduced by 95% for Tempdb Log File Usage {from ?}
e ore A ter
Problem Y
Time utilized for backing up is huge (e.g. 65 GB database takes up to 4 hours to backup) Total # of Ctrl-M Jobs for 5 servers - 136
# obs
er ers
tri es
ime a en
Problem Y
Inconsistent Back-up mechanism Multiple jobs running for Back-up High utilization of system resources No tracking mechanism
mins
process to Reduce system resource utilization X Single Script to Alert, Rotate, Compress & Purge (as per NBCU Log Retention Policy) X Effective Utilization of time to run the back-ups X Automated Dashboard (Capacity Vs Utilization) & Trend Analysis of Logs Growth rate
Before
X Sybase Uncompressed Backup X X X X
After
X Sybase Native Compressed Backups X X X X
Mechanism used Gzips the files 13 files ( Stripes ) for each database Total of Ctrl-M Jobs for 5 servers 136 Time taken is very huge (for e.g. 65 GB database takes up to 4 hours for complete backup) Time taken to backup one server is approx 14+ hours
Mechanism used 6 files ( Stripes ) for each database Number of Jobs per server : 1 Total of Ctrl-M jobs for 5 servers - 5 Advantages :
Email Alert with the Job Status Job has error mechanism to report the error returned Time taken is less ( for e.g 65 Gb database takes about 30 min to backup) Log files are not over written and are kept for 20 days, after which they are deleted by the script Time taken for each database backup is recorded