You are on page 1of 21

CASE STUDY

Reducing IT User Downtime Using TQM a Case Study


This IT case study was done during the implementation of TQM in a financial services company with several hundred computers and computer users in multiple locations throughout India. The results have widespread applicability. This Information Technology (IT) case study was done during the implementation of Total Quality Management (TQM) in a financial services company with several hundred computers and computer users in multiple locations throughout India. The results have widespread applicability and in particular are aimed at organizations with large computer networks, IT facilities management companies and customer service providers. Success in any improvement effort is a function of techniques accompanied by a mindset change in the organization. This project was undertaken as part of the second wave of projects aimed at spreading the quality mindset in the organization. The narrative unfolds in the chronological sequence of TQMs Seven Steps of Problem Solving (similar to DMAIC in Six Sigma), describing the critical process stages where results were achieved and mindsets changed.

Step 1 Define the Problem


Selecting the theme: After an initial two-day TQM awareness program, the companys senior management selected a theme by consensus: Dramatic Improvements in Customer Service. As part of the theme, one of the improvement areas selected was Reducing the response time to resolve IT (hardware and software) problems faced by internal customers. The company had outsourced its network and facility management. A small technical services management team and help desk oversaw the vendors work. Problem = Customer desire actual status: Detailed data was available regarding the time of receipt of each call from the customer (in this case, the network users) and the time of call closure. Monthly management reports aggregated the performance by enumerating the number of calls that were resolved in the following categories:

Call Closure Time

< 30 Mins.

< 60 Mins.

<2 Hours

>2 Hours

< 24 Hours

< 48 Hours

> 48 Hour

While the information about what happened was well recorded, there was no information about what users had desired to have happen. The deviation from user desires or even the service standard promised to users was not measured. Defining the problem therefore resulted in a changed mindset from data being used just as an internal record to measuring and assuring a service standard to the user. The calls were categorized into groups that would be expected to have a service standard time of closure as defined in the table above. A month of data was analyzed by subtracting the service standard time expected to be delivered and the actual time taken to resolve each call. The gaps between the actual closure time and the standard time were a measure of the problem. It was clear that the data needed to be prioritized in order to proceed. A Pareto diagram was drawn (Figure 1). It indicated that two categories < 30 minutes (67%) and > 120 minutes (27%) constituted 87% of the incoming load. It was decided to attack the < 30 minutes category first.

Definition of metrics: In order to define clear metrics, the concept of sigma was introduced to represent variability in timeliness of service. It was quickly grasped by the group that a 3-sigma standard translates into a 99.7 percent ontime performance. (Average + 3 sigma) of the actual closure times should be less than the service standard. This meant that for the < 30-minute call category: If T30 = average + 3 sigma of 30-minute calls closure times T30 < 30 minutes for a 99.7 percent on time performance

The past months data revealed: T30 = 239 minutes The objective was now clearly defined: Reduce T30 from 239 to <30, i.e. by 85 percent

Dividing the Task into Phase A and Phase B


Since making such a big reduction was too daunting a task for a team embarking on its first project, using the concept that improvement occurs step by step, the initial objective, or Phase A, was to reduce T30 by 50 percent. A project charter was drawn up accordingly. Step 2 (Phase A) Analyze the Problem: The T30 calls were arranged in descending order according to actual time of closure. Those calls that had taken more than 30 minutes were segregated for analysis. It was recognized that the problem of quality was one of variability, and that the most effective solution to the problem would be ending the causes of calls with a very high time of closure. Thus, T30 calls that had taken more than 130 minutes (T30:130) were analyzed first (Figure 2).

The top three categories contributed approximately 75 percent of the problem. To sequence the order of attack, the group chose big and easy to precede big and difficult problems. Using that criteria, Not Aware of Change Rule was chosen. Step 3 (Phase A) Find the Root Cause: In these cases the engineer attending to the call had not closed the call after attending to it. The Five Whys technique was used to determine the root cause Why had he not closed the call? Why was he not aware that he was supposed to close the call? Why was the procedure of call closure changed and he was not informed? Why is there no standard operating procedure to inform employees before closing the call? Step 4 (Phase A) Generate and Test Countermeasure Ideas: Countermeasures were easily identified first, inform all the engineers; second, develop a standard procedure for informing all users before making a change in procedure which affects them. The engineers were informed of the new procedure. Step 5 (Phase A) Check the Results: The next three weeks showed a dramatic drop in the T30 value from 239 to 121 minutes. The objective of 50 percent reduction had been achieved. Step 6 (Phase A) Standardize the Results: A standard operating procedure was drawn up for future reference. An X Bar control chart (Figure 3) was introduced for routine day-to-day control. Step 7 (Phase A) Present a Quality Improvement Report: Drawing up the quality improvement report was deferred due to the project being continued to attempt to make further improvements.

Figure 3: Control Chart for 30-Minute Calls (September)

Phase B to Further Reduce Downtime


Step 2 (Phase B) Analyze the Problem: The second phase of the project, or Phase B, was to reduce the T30 value by 50 percent again, from less than 120 minutes to less than 60. The T30 calls which took more than 30 minutes to close were collated and arranged by category in descending order of time to close. There were two categories with the following data:

Categories

Calls

Minutes

Minutes/Call

Log-in

39

2720

70

Printing

16

1672

104

Based upon the big and easy principle, the group chose to attempt the printing problem first. The printing calls were sub-categorized by location and then by solution since they had already been resolved. Seven of the 16 calls were from Location 1,and seven of the 16 calls had been solved using the same remedy reinstalling the printer driver. Step 3 (Phase B) Finding the Root Cause: Why did the printer driver need frequent re-installation? The group brainstormed and generated 10 possible causes. A check sheet to collect data was designed. For the next two

weeks, the engineers were asked to record the reason of why the printer driver needed to be reinstalled each time they were attending to such a call.

Figure 4: Control Chart for 30-Minute Calls (October)

When reviewed, the data surprised the group members. It clearly illustrated the superiority of data-based problemsolving over intuitive problem-solving. And it acted as a major mindset changer. The problem, the data showed, was that the printer was going off-line rather than its driver needing reinstallation. Why was the printer going off-line? Brainstorming quickly produced the cause: The machines being used had three versions of the Windows operating system 98, 2000 and XP. In the Windows 98 version there was a problem if a user tried to print without logging-in, the printer would go off-line and the next user would experience the problem. The cause was quickly confirmed as the root cause by one of the members trying to print without logging-in. Step 4 (Phase B) Generate and Implement Countermeasure Ideas: The group discussion produced the idea of adopting a software change to not allow a user to try printing without logging-in. All the machines using Windows 98 were identified, and the change was implemented. Applying the standard operating procedure used in Phase A, the group was careful to inform all users of the change before implementing it. Step 5 (Phase B) Check the Results: The calls were monitored for another two weeks and the results amazed the group. The data showed a dramatic drop of the T30 value from 121 to 47 minutes (Figure 4). A total reduction of 80

percent had been obtained in the T30 value. The question arose why the reduction had been much more dramatic than the data as per the Pareto chart would indicate. There are two reasons:

1.

While the problem-solving method identified the vital problems using the calls that took a long time to resolve, there were undoubtedly many calls with the same problem and cause that were attended to within the standard time and therefore did not show in the analysis.

2.

The system of daily control chart plotting and review with the engineers and the group raised the awareness of timeliness and thereby increased the urgency for a solution.

Step 6 (Phase B) Standardize the Results: A standard procedure was developed and circulated to all regions to implement the change at all locations. Step 7 (Phase B) Present a Quality Improvement Report: A quality improvement report was written and presented to the Steering Committee.

Future Work and Conclusions


The work of the group is continuing in the following directions:

1.

The T30 calls are now being analyzed to further reduce the time. Two interesting solutions are emerging that promise to cut the downtime further.

2.

T60 calls are now under study. The average + 3 sigma of closure time of this category has been measured at 369 minutes. Work is being done to reduce it to < 60 minutes.

This case study demonstrates several principles of TQM and Six Sigma:

1.

What cannot be measured, cannot be improved. (Establishing service standards and the use of sigma and control charts for on-time delivery of services were essential in making improvements.)

2. 3. 4.

It is important to develop customer-oriented metrics. Mindset change is crucial to the success of any improvement effort. Standardizing the improvement can take longer than the improvement itself. (It is still continuing in this application.)

5.

There is value in step-by-step improvement and continuous improvement.

TQM Case Study: Newspaper Focuses on Customer Service


Improving customer service was the focus of two projects within the deployment of Total Quality Management in a mid-sized newspaper in India. The projects involved adjusting advertisement deadlines and reducing the number of billing errors. Quality in the Total Quality Management (TQM) method is defined as customer delight. Customers are delighted when their needs are met or exceeded. The needs of the customer are:

Product quality Delivery quality Service quality Cost value

Improving customer service was the focus of two projects within the deployment of TQM in a mid-sized newspaper in India. This is the second piece in a three-part series of articles featuring case studies from that deployment; Part 1 of the series featured projects leading to improvements in product quality.

Reducing Advertisement Processing Time


The newspaper closed its window for booking advertisements at 4 p.m. every day. However, many of the newspapers advertisers expressed that they would be delighted if this limit could be extended to 5 p.m., as they were not able to send ad materials on time for the 4 p.m. deadline. The TQM leaders formed a team consisting of representatives from each link in the ad-processing chain of work. The team attended a two-day quality-mindset program to expose them to the concepts of TQM and also to open their minds about experimenting with change. Defining the Problem In TQM, problems are defined as Problem = Desire Current status. Therefore, in this case: Problem = Desired closing time Current closing time = 5 p.m. 4 p.m. = 60 minutes The 4 p.m. deadline had been instituted because:

Deadline for sending the ad pages to the press was 6:30 p.m. Standard cycle time for processing ads into pages was 2.5 hours Achieving a 5 p.m. ad closure deadline meant reducing the standard ad processing time by 40 percent, or one hour. To define the current state, the actual time spent preparing pages to go to press was collected over several days. Defining the metric: If T = (page processing time page-to-press deadline), then for 99.7 percent on-time delivery, or 3 sigma performance, the average T + 3 standard deviations of T should be less than 0. Measure the current state: The ad closing deadline could not be delayed by an hour without delaying the dispatch of the newspaper to press by an equivalent amount. Therefore, the current state was calculated by measuring the delay compared to a notional 5:30 p.m. dispatch time rather than the actual deadline of 6:30 p.m. Calculations showed that:

Average T = 72 minutes Average T + 3 sigma of T = 267 minutes The problem was defined: reduce 267 minutes to less than 0 minutes. Analyzing the Problem

The team monitored the time spent on each activity of the ad process (Table 1).

Table 1: Time Spent on Ad Process

Activity

Deadline

Ad receiving

4 p.m.

Dummy dump

4:30 p.m.

Pagination complete

6:30 p.m.

During the 4 to 4:30 p.m. period, ads received at the last minute were still being processed. At 4:30 p.m., the material was dumped into the layout for pagination, meaning arrangement on the newspaper pages using software and manual corrections. To achieve the objective of a 5 p.m. ad content deadline, the pagination time had to be reduced.

Brainstorming why pagination took two hours produced three possible major reasons:

Error correction Delayed receipt of ad material for a booked ad Last-minute updates from advertiser All this work was carried out after the last ad was submitted. Team members suggested that if ads were released for pagination earlier, removing errors could begin simultaneously with the processing of the last ads in order to reduce cycle time. They agreed to give two early outputs at 3:30 and 4 p.m., before the final dump at 4:30 p.m. Testing the Ideas

Table 2: Problems with New Process

Problem

Effect

Root Cause

Solution

Missing material removal

15 to 30 min.

Material delayed or not received

Only feed ads once all materials received

Error file found after last release

10 min.

Not checking pre dump

Check for errors pre dump

Special placement instructions not followed

10 min.

Processing team not aware of special instructions

Give instructions as received

Distorted ads in PDF

15 min.

Ads not corrected before feeding

Correct before feeding, include in SOP

Ads inserted post pagination completion

20 min.

Ads accepted after deadline

Enforce deadline

Total time savings possible

70 to 85 min.

The process was repeated four times (Table 3).

Table 3: Further Process Observations

Problem

Effect

Root Cause

Solution

Observation 2

Repeating old practices

Reiterate SOPs

Scanning of materials delayed

45 min.

Agree on scan turnaround time

PDF conversion problem

15 min.

Programming problem

IT to resolve

Zip error file not scanned

Zip not required

Observation 3

System failure at peak time

75 min.

Use back-up system

Observation 4

Add-on section integration delayed

25 min.

Start integration in pre-dumps

Add to SOP

Checking the Results

Nine weeks of continuous implementation yielded dramatic improvement. Average processing time was reduced by an hour, from 72 minutes to 12 minutes. However, the level of variability, although 50 percent lower, was still unacceptable. Analysis of the variability showed that it was largely due to slip-ups in implementing the SOPs. Standardizing Controls

The team used an x-bar control chart (Figure 1) to monitor and improve performance regularly.

Figure 1: Control Chart of Ad Processing Time

Gradually the performance improved. Two months after implementation, delivery time had progressed from 267 minutes late to 12 minutes early. The deadline for receiving ads could now be relaxed to 5 p.m., delighting the advertisers.

Reducing Customer Complaints


Management indicated that the number of credit notes given to advertisers was too high. Credit notes, issued to rectify errors made in sales invoices, were used to fend off considerable customer annoyance. But this system caused trouble for the paper. Besides increasing non-value-added work, credit notes sometimes resulted in financial loss because customers could use the credit toward ads that had already been booked as sales. During the previous 12 months, the newspaper had received 80 credit notes per week. The team agreed to try to reduce that number by 50 percent in Phase 1. Finding the Root Causes

About 200 credit notes were examined to determine why they had been issued. Categorization of the causes was charted in a Pareto (Figure 2).

Figure 2: Pareto Chart of Complaints Resulting in Credit

Three causes constituted 84 percent of the problem:

1. 2. 3.

Wrong billing 46 percent Wrong rate 24 percent Wrong material used 14 percent

Table 4 shows the root causes of a majority of the credits issued, determined using the 5 Whys method, and their corresponding countermeasures.

Table 4: Explanation of Credit Causes and Countermeasures

1st Why?

2nd Why?

3rd Why?

Countermeasure

Wrong billing

Unbilled charge picked up; Discount applied incorrectly to all ads in series

System bug

Removed

Wrong rate

Sales scheme not in sales card; Old scheme continues after updating of

Sales cards not updated; Bill system

SOP

sales rate card; Scheme in rate card but does not pick up not picked up by system entry

Free ads

System does not pick up operator entry

Modify system to pick up operators entry when prompted, rather than automatically

billed

taking billing information from the rate table.

The team tested the ideas, which resulted in an 80 percent reduction in credit notes, from 80 per week to 14 per week. The process was adopted in regular operation, and the results were documented and presented to senior management. Change in Thinking

TQM often leads to radical changes in employee mindsets. The improvements resulting from the two customer service-related projects helped to create a team environment in which any change idea is easily accepted, tested and if it works implemented.

Fixing Payroll Problems: A TQM Case Study in Human Resources


Kirsten Terry April 21, 20101 The following case study details a consumer goods companys experience using the TQM methodologys Seven Steps of Problem Solving in its human resources department to address the payroll process. By Niraj Goyal A large, Indian, fast-moving consumer goods company had completed successful Total Quality Management (TQM) projects to improve its manufacturing efficiency, expedite vendor payments and increase availability of finished products. For its next project, the company wanted to address problems in human resources (HR). By working with HR process owners, a focus for the project emerged the payroll process. The following case study details the companys experience using the TQM methodologys Seven Steps of Problem Solving to address the issue.

Pre-step 1: Select the Problem

After attending an introductory two-day training program in TQM, the project leader asked the companys HR employees to brainstorm key problems in human resources. They also considered the results of each problem (Table 1).

Table 1: Problems in the Payroll Process

Problem

Result 1

Result 2

Accuracy of data

Delay

Errors

Delayed output

Delay

Functioning of payroll centralization process

Delay

Manual data generation

Delay

Follow-up on data

Delay

High recruitment turnaround

Lack of standard operating procedures (SOPs)

Delay

Errors

Communication

Delayed response to employees

Delay

From this list, the group could see that the real problem was that internal customers were facing delays and errors. The group went on to brainstorm and prioritize the major areas of errors and delay within HR (Table 2).

Table 2: Prioritized Areas Where Employees Encounter Errors and Delays

Problem Area

Score

Table 2: Prioritized Areas Where Employees Encounter Errors and Delays

Problem Area

Score

Employee database

169

Payroll

139

Separation

125

Recruitment transfers

117

Budget

114

Talent development

113

Performance management

98

Communication

90

Training

64

Reimbursements

63

Discussion revealed that the employee database is not a problem in itself; the team decided to tackle the payroll process instead. HR employees told the group that completing their job each month without delays or errors required a lot of pressure and running around. A representative group from the finance department, the payroll manager, key payroll personnel and the four regional HR managers were selected for the project team. A leader and secretary were nominated, and the team began meeting every other week.

Step 1 Defining the Problem

In TQM, a Problem = Desire Actual Status; problems also must be measurable. The team faced the challenge of measuring undue pressure on behalf of the payroll employees. They decided that the metric employee overtime could represent this pressure. The team set out to record how much overtime each employee was incurring daily and what activities they worked on during that overtime. Measurements during the first month yielded an average of 36 minutes of overtime per person per day. This average did not appear so bad. In reality, however, the problem was the peaks rather than the average. Employees tend to remember the stressful days when overtime is high. To get a better picture, the team calculated a standard deviation of 18.8 minutes. This meant that on the worst days, overtime was an average of 92 minutes per person (average + 3 standard deviations) and on those days there were two or three employees whose overtime was much higher than 92 minutes. Therefore, the team decided to work to reduce the average + 3 standard deviation limit to address the problem. They set a Phase 1 target to reduce the average + 3 standard deviation time by 50 percent.

Step 2: Finding the Root Causes


The team mapped overtime activities in a Pareto diagram to ascertain the vital causes (Figure 1). Table 4 shows the top 7 causes accounting for 81% of the OT. Figure 1: Overtime Activities

Table 3: Top Seven Overtime Causes

Problem

Overtime Percent

Recruitment

17

Meetings

16

Data crunching

14

Employee relations

14

Master changes in SAP

10

Special projects

Head office formats

Recruitment necessitated after-hours interviews, while meetings involved other departments not yet trained in TQM. The causes that the team could change were data crunching, master changes in SAP (the enterprise resource planning program) and repeated changes in data formats requested from the head office. These three areas constituted 29 percent of the overtime and were addressed first. Sixty percent of the overtime in these areas emanated from two regions; another 35 percent came from two employees in the head office. Why? The other region representatives explained that they had put in a special onetime effort to develop data entry and storage formats for the diverse information requested by the head office to reduce future data crunching. They shared this standardized formatting with the two lagging regions to reduce their overtime. But why were the regions developing formats in the first place? Were the formats not present already? The team mapped the current process steps:

1. 2. 3.

Regions enter changes to be made in the SAP personnel master into an Excel sheet Excel sheet sent to head office Head office employees enter data into SAP before the payroll each month. The payroll employees face intense pressure due to gaps and errors in the data entry.

Step 3: Countermeasure ideas


The team suggested a two-phase process change using just-in-time principles: Phase 1: Replace batching with flow processing. With this method regions enter and send data weekly, and the head office enters weekly, without waiting until the end of the month. Phase 2: Eliminate non-value added stages. Eventually, the regions should be able to enter data directly into SAP weekly, and the head office will enter its own entries weekly.

Steps 4 and 5: Testing Ideas and Checking Results


The countermeasure ideas took two months to test. An X-bar control chart was introduced to track the average overtime per person per day. The chart showed a 48 percent reduction in average time + 3 standard deviations, from 92 minutes to 50 minutes.

Step 6: Standardizing Operations

The 3 standard deviation limit was maintained. Simultaneously, however, employees were also experiencing stress and working overtime due to errors or incomplete entries during the payroll run and frantic queries for the correct information. Finding the most frequent errors, their root causes and countermeasures would eliminate this problem. The team selected the metric errors per query per payroll. There were 65 in the first run. Following is an example of an error, its cause and the countermeasure the team developed to resolve it:

Error: Incorrect deduction of lunch coupons Number of occurrences: 11 in two months, or 15 percent of total errors Root cause analysis: All errors occurred in one region. The region with errors gave lunch coupons at the beginning of the month, while other regions gave them at the end of the month, thus making the accounting foolproof.

Countermeasure: Adopt standard process Check the result: No errors post implementation. Within three months, errors and queries were reduced by 98 percent from 65 per payroll run to 1. Regular progress tracking was introduced (Figure 2). Figure 2: Errors and Queries Per Payroll Run

Step 7: Maintain Improvements


The team compiled the improvement results and presented them to management. In the future, the payroll manager will meet with the staff after each Payroll run to analyze and address any errors that are occurring. The overtime control chart will be plotted every day, and any unusual spikes also will be analyzed and addressed.

The project also led to changes in the mindsets of the employees involved. For instance, after the project, the human resources director remarked how one of the participants made an error in his work and reported it, along with a 5 Why and countermeasure analysis something that would never have happened earlier.

You might also like