You are on page 1of 16

A Continuum of Disk Scheduling Algorithms

ROBERT GEIST Clemson University and STEPHEN DANIEL Microelectronics Center of North Carolina

A continuum of disk scheduling algorithms, V(R), having endpoints V(0) = SSTF and V(1) = SCAN, is defined. V(R) maintains a current SCAN direction (in or out) and services next the request with the smallest effective distance. The effective distance of a request that lies in the current direction is its physical distance (in cylinders) from the read/write head. The effective distance of a request in the opposite direction is its physical distance plus R X (total number of cylinders on the disk). By use of simulation methods, it is shown that this definitional continuum also provides a continuum in performance, both with respect to the mean and with respect to the standard deviation of request waiting time. For objective functions that are linear combinations of the two measures, pw + kuw, intermediate points of the continuum are seen to provide performance uniformly superior to both SSTF and SCAN. A method of implementing V(R) and the results of its experimental use in a real system are presented. Categories and Subject Descriptors: D.4.1 [Operating Systems]: Process Management-scheduling; D.4.4 [Operating Systems]: Communications Management-input/output; D.4.8 [Operating Systems]: Performance-measurements; modeling and prediction; simulation General Terms: Algorithms, measurement, performance disk, SCAN,

Additional Key Words and Phrases: First come, first serve (FCFS), moving-head scheduling, shortest seek time first (SSTF), simulation, system measurements, V(R)

1. INTRODUCTION
Scheduling algorithms for moving-head disks have been studied for many years, but which algorithm is best is still an open question. Simulation studies [4, 8, 91 have provided considerable insight into the operational characteristics of a plethora of proposed algorithms, but few studies report measurements from real systems, and analytical studies that attempt to calculate expected algorithm performance are nearly nonexistent [2]. As a result, most scheduling algorithms in use today are variations on one of a few central themes. First come, first served (FCFS) is still a popular choice. It is easy to implement and it is fair in that the expected waiting
Authors present addresses: R. Geist, Department of Computer Science, Clemson University, Clemson, SC 29634-1906; S. Daniel, Thinking Machines Corporation, 245 First St., Cambridge, MA 02142. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1987 ACM 0734-2071/87/0200-0077 $00.75 ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987, Pages 77-92.

78

R. Geist and S. Daniel

time of a request (exclusive of service, i.e., exclusive of its seek and transfer times) is independent of its cylinder. On the other hand, it is generally believed that FCFS imposes on the system an unacceptably high mean waiting time for requests, and most studies agree that head position information should be used in scheduling [4, 5, 81. Shortest seek time first (SSTF), an algorithm in which that request closest to the current head position is serviced next, represents the most obvious use of such information. SSTF appears to overcome the mean waiting time deficiency of FCFS and has been credited with the fastest mean response [4,9]. It has also been credited with intolerable response time variances [9], although this latter claim has been disputed more recently in [4]. The SCAN algorithm represents an attempt to control the variance of waiting time without sacrificing too much in the mean. SCAN services all requests in ascending order by cylinder number, until no requests for a higher cylinder can be found. The scan direction is then reversed, and cylinders are processed in decreasing order. The largest measure of support for the efficacy of scanning algorithms comes from the extensive simulation study by Teorey and Pinkerton [8]. Their conclusions are suspect, however, because the results on which they base their selection of a best algorithm appear to be from portions of the simulations in which the systems under study are supersaturated, with mean arrival rates far in excess of mean service rates. (Consider the saturation curve X = l/T, in [8, figure 21.) For any such system, the request queue necessarily grows without bound, and so it is not clear how such simulations compare with real systems, whose request queues obviously cannot grow in this fashion. In [3], we introduced V-SCAN, a variable mixture of SSTF and SCAN. Preliminary results there indicated that V-SCAN might be tuned to give a performance superior to both SSTF and SCAN in terms of that combined goodness measure originally proposed by Teorey and Pinkerton [8]. It is the purpose of this paper to recast V-SCAN in the framework of a one-dimensional continuum of disk scheduling algorithms. SSTF and SCAN will be seen to represent endpoints of the continuum. Our main result, obtained through extensive simulation, is that the continuum so defined also represents a predictable continuum in performance, both for the mean and for the variance of request waiting time. We conclude with experimental results from a real system in which various points of the continuum were installed and tested against each other and against FCFS.
2. THE CONTINUUM

For each real R E [0, l] we define a scheduling algorithm V(R) as follows: The request closest to the current head position is always serviced next, but closest is no longer defined in the strict Euclidean sense. Specifically, V(R) maintains a current scan direction, which is defined to be the direction of the last seek. The distance to any request in the current direction is then simply the number of cylinders to seek over, as expected. However, the distance to any request in the opposite direction is given by (number of cylinders to seek) + R X (total number of cylinders on the disk). Thus V(0) = SSTF, and V(1) = SCAN. Intermediate values of R essentially provide a SCAN window, outside which SSTF is invoked. (See Figure 1.)
ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

A Continuum of Disk Scheduling Algorithms R X(total number of cylinders)


I I

79

spindle

he$d

I current direction
Example Request Priorities 1st: A 2nd: B

3rd: C 4th: D, E (tie)


Fig. 1. Conceptual operation of V(R).

Although implementation of V(R) might appear to be relatively cumbersome, we will see in Section 4 that at the instant we are ready to begin a seek we really need only compute the effective distances to at most two requests, and we never need to reshuffle request queues.
3. THE SIMULATION

Extensive simulation studies of V(R) were conducted for various values of R. Disk parameters (size and speed) were matched to the values used by Teorey and Pinkerton (see Table I), and the arrival stream of requests to the disk was assumed to be a time-homogeneous Poisson process. Various arrival rates for this process, as well as various distributions of the requests across the cylinders, were considered. The distribution of requests across the sectors within any given cylinder was assumed uniform. The 4.2BSD UNIX1 function rundom( ) was used to drive the arrival streams. In each case the mean and the standard deviation of request waiting time were estimated. Successive seeks are not independent: If the last seek was to the middle cylinder, the probability that the next will be over more than half the disk is zero. Since the seek time of each request is a component of the waiting time of the subsequent request (if pending), we must allow for a correlation in the waiting times of successive requests. In order to counter the effects of correlation among successive samples and the lack of true system regeneration points, we chose for our simulation the rather time-consuming method of independent replication [5]. On the basis of initial runs, the amount of time required for the generation of 50 requests was deemed a suitable transient (stabilization) period. In each subsequent simulation run, the first 50 requests were flagged as dummy requests, that is, requests for which no measurements were taken, even though the arm was moved to the requested cylinder and the transfer (rotation to end of requested sector) was completed for each. Following this, a fixed number of real requests were generated, and dummy requests were added to the queue until all the real
1 UNIX is a trademark of AT&T Bell Laboratories. ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

80

R. Geist and S. Daniel


Table I. Disk Specifications IBM 2314 (Teorey and Pinkerton) Fujitsu 134-Mbyte SMD Drives 801 32 0.06 5.94 16.66

Number of cylinders Sectors per track Seek time (milliseconds per cylinder) Seek startup (milliseconds) Rotational speed (milliseconds per revolution)

200 4 0.50 30.00 25.00

requests had been processed. Each such run was replicated 500 times, and the results were averaged. An appropriate request sample size was selected on the basis of the results of runs of 1000 real requests per replication. It was estimated that 2000 real requests per replication would be sufficient in every case (i.e., for all rates and distributions of requests) to ensure 90 percent confidence in a maximum error of 5 percent in our estimates of mean waiting time ~w. Our estimates of uw were made using the jackknife method of J. W. Tukey [6,7], which removes the first-order bias caused by correlation within sample runs and again allows us to construct 90 percent confidence intervals. Failure to account properly for this correlation could result in an underestimation of uw and thus lead to overly optimistic predictions for performance. However, in our simulation studies we found that the ordinary sample standard deviation underestimated the jackknifed values by at most 0.1 percent, and thus, in retrospect, the ordinary sample standard deviation would have been an adequate measure. Figure 2 shows the result of simulations that utilize a uniform arrival distribution (every cylinder is equally likely). Sample mean waiting time is plotted as a function of arrival rate for V(O), V(O.l), V(O.2), V(O.3), and V(1). Ordinate values have been normalized so that the value for V(0) = SSTF is always 1. Ninety percent confidence intervals were calculated, and, in each case, their sizes were nearly identical to those obtained for V(0.2) (shown in the figure). The low envelope representing best performance is given by Arrival Rate (requests per millisecond) 0.004-0.011 0.011-0.018 0.018-0.024 Algorithm V(0) = SSTF V(O.1) V(O.2)

Beyond an arrival rate of 0.024 the performance of each algorithm begins to converge on that of SSTF. The maximum arrival rate of 0.032 is the estimated SSTF and SCAN saturation rate for the indicated parameters [8]. We note that FCFS, though not realizable as a part of the continuum, was also simulated, but the waiting-time values were uniformly too large to appear in the figure. In fact, waiting-time mean and variance estimates for FCFS were too large to appear in any of our figures, and thus we support the conclusions reached in [4]. Figures 3 and 4 present the results of runs in which the arrival rate is held constant at a subsaturation level (0.015 requests per millisecond), and the shape
ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

N E d

...

. .,.

. ..

,. .. \

....

) .. .. ... .. .. .. ... .. .. .

...

c > z n I Z P

m 7

t-4

3 \ )......... ........... .1 ... ... .


7 -i

84

R. Geist and S. Daniel

of the arrival distribution across the cylinders is varied. In Figure 3 the arrival to cylinders is unimodal and varies from a highly skewed, asymmetric distribution to a symmetric, nearly normal one. (Abscissa values represent k in Ek; see the Appendix.) Again, the sizes of the 90 percent confidence intervals are indicated by those for V(0.2) (shown in the figure). Although we still obtain a rather elegant continuum of performance, the low envelope is now provided by a single algorithm, V(0) = SSTF. The distribution in Figure 4 is bimodal and varies from almost flat to very sharply peaked (see the Appendix). Arrival rate is again fixed at 0.015 requests per millisecond. Once again we find an extremely regular performance pattern for which V(O.l) provides the low envelope. At this stage we might be led to conclude that, if the mean arrival rate is low to moderate, then on the basis of mean waiting time we could do little better than to simply choose SSTF (or perhaps V(O.l)). There are two problems with this approach. First, as arrival rate increases beyond a rather moderate threshold level (about 0.016 in Figure 2), the performance of SSTF with respect to mean waiting time suffers in comparison with V(R), for any R > 0. Second, the mean is not the whole story. Similar regular performance patterns were observed with respect to waiting-time variance, but the optimality was inverted: V(1) = SCAN was most often the algorithm of choice! This can be seen in Figures 5-7, where we plot estimated standard deviation of waiting time for the same arrival rates and request distributions as those used in Figures 2-4. Note the relatively poor performance by V(O.l) and V(0) = SSTF. It should not be surprising, then, that for various linear combinations of the sample mean and sample standard deviation, m + kSw, V(R) with 0 < R < 1 exhibits a performance uniformly superior to both SSTF and SCAN. For example, consider the measure m + 0.675&, which, by an entirely reasonable application of the central limit theorem, can be regarded as an upper bound on 75 percent of the waiting times. (Recall that we have simulated l,OOO,OOO requests per data point.) With respect to this measure, we find that for all distributions studied and all arrival rates ~0.028, the V(0.2) algorithm provides the low envelope. The implications here are rather apparent. Different systems inherently require different weights to be placed on the relative importance of response time mean and variance. (Compare interactive versus batch orientation.) V(R) allows the system administrator to bring the operational characteristics of the disk subsystem into line with those externally imposed preferences. On the other hand, at this time we can only offer two guiding principles on how to choose R: (1) Larger values of R tend to reduce the waiting-time variance, and smaller values tend to reduce the mean, though R = 0 may give rise to a threshold phenomenon. (2) For R > 0.4, the performance of V(R) has not been measurably different from that of V(l), either in simulation or in experiments with a real system.
4. IMPLEMENTATION

We implemented V(R) under the UNIX operating system. Several different values of the decision parameter R were tested against each other and against FCFS.
ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

.. .(

.. I

1.., i$ s-3:: .. ....., i

,.....

. .

. . . .

. . .

...,

88

R. Geist and S. Daniel


Table II. V(O.l) versus V(0.2)

R = 0.1
Running time (minutes) Average queue length Drive 0 Drive 1 Average arrival Drive 0 Drive 1 719.17 0.197 0.265

R = 0.2
719.30 0.202 0.287 3.65 7.22 0.055

Combined 1438.47

0.200 0.276 3.56 6.95 0.056 0.040


92.48

rate (requests per second) 3.48 6.68 0.057

Average waiting time (seconds) Drive 0 Drive 1 System throughput Ave (characters

0.040

0.040

per second) 90.55 94.40

W statistic: 1.494

Table III.

FCFS versus V(0.2) FCFS

R = 0.2
719.33

Combined 1438.62 0.381

Running time (minutes) Average queue length Drive 0 Drive 1 Average arrival Drive 0 Drive 1 Average waiting Drive 0 Drive 1

719.29 0.379

0.436

0.382 0.429 6.66


9.10

0.433 6.53 8.91 0.058 0.049


144.60

rate (requests per second) 6.39


8.71

time (seconds) 0.059

0.050
(characters

0.057 0.047

System throughput Ave

per second) 141.24 147.96

W statistic:

1.206

To implement V(R), the driver must maintain for each disk the current cylinder number, the direction of the last seek, and two request queues. One queue contains all requests for cylinder numbers lower than that of the current cylinder, while the other queue contains all requests for higher numbered cylinders. Both queues are sorted with their requests in ascending order by physical distance from the current cylinder. When a request arrives, it is sorted into the appropriate queue. When a request must be started and both queues are nonempty, the driver computes the distance from the current cylinder to the first request on each queue. The request that will necessitate a change in the scan direction is then penalized by the addition of a weight parameter, R X (total number of cylinders on the disk), to its computed distance. The request with the smaller resulting effective distance is selected and processed, and the current cylinder and scan
ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

A Continuum of Disk Scheduling Algorithms


Table IV. SSTF versus V(0.2)

89

R=O
Running time (minutes) Average queue length Drive 0 Drive 1 Average arrival Drive 0 Drive 1 719.29 0.259 0.395

R = 0.2
719.17 0.257 0.448 4.84 10.50 0.053 0.042

Combined 1438.46 0.258 0.422 4.87 9.98 0.053 0.042 95.03

rate (requests per second) 4.91 9.47 0.053 0.042

Average waiting time (seconds) Drive 0 Drive 1 System throughput Ave W statistic: 1.047 (characters

per second) 96.61 93.45

Table V.

SCAN versus V(0.2) R=l

R = 0.2
720.60 0.467 0.576 7.40 11.48 0.063 0.050

Combined 1441.36 0.464 0.560 7.34 11.17 0.063 0.050 90.43

Running time (minutes) Average queue length Drive 0 Drive 1 Average arrival Drive 0 Drive 1

720.76 0.461 0.543

rate (requests per second) 7.27 10.86 0.063 0.050

Average waiting time (seconds) Drive 0 Drive 1 System throughput Ave W statistic: 1.346 (characters

per second) 93.16 87.69

direction are then updated. We find that this implementation is not measurably slower than the original UNIX scheduling algorithm, a variation on SCAN. Various switching and measuring features were added to the disk driver, and the following strategy was adopted for testing different algorithms and parameter values: The disk queue length on each drive was sampled 200 times per second. A real-time clock, running independently of any system process, was used to control the sampling. After one minute, the disk driver was given a new set of parameters, old measurement values were reported, and the counters were reset. No attempt was made to measure the distribution of requests across cylinders. This method was adopted from [l]. The experiments each ran for 24 hours, and the results are shown in Tables II-V. For our purposes, two of the measured values are considered of greatest importance. One is the mean wait time (measured here in seconds), and the other
ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

90

R. Geist and S. Daniel

is the global system throughput measure. The latter measure is in characters per second and is the average number of characters typed on all terminals each second. Since our UNIX is a timesharing system used mostly for text processing and program development, the characters output on the terminals represent the real work of the system. Further, this measure, though technically different from requests completed per second, is certainly linked to the systems response time (Littles theorem [ll]).
5. TEST RESULTS

Our UNIX system is a O.&Mbyte PDP 11/70 with two 134-Mbyte SMD Fujitsu Winchester drives. (See Table I for disk specifications.) During the day, our typical load is eight or more users doing program development, text editing, and some text formatting. As distributed, UNIX uses a circular SCAN algorithm, that is, a SCAN in the ascending direction only, followed by a seek back to the lowest numbered cylinder requested. It is further modified to give all read requests priority over all write requests. Since a running process must wait for a read to finish, but UNIX will buffer writes and allow the process to continue, the authors of the UNIX disk algorithm believed that reads should be given priority. Since reads outnumber writes by eight to one, it is likely that this priority scheme will have relatively little impact on overall disk throughput. In [3], an algorithm equivalent to V(O.l) was shown to be decidedly superior to the UNIX circular SCAN, and so the latter I was not included in the present set of tests. In Table II we show the results of our first test, V(O.l) against V(0.2). We see that V(0.2) provided a slight (3.5 percent) improvement in mean waiting on Drive 0, but none on Drive 1. On the other hand, the improvement in throughput was significant. (Note that arrival rates will be affected by the scheduling discipline and hence are likely to differ.) The W statistic shown represents a Wilcoxon nonparametric test [ll] of the hypothesis of equal throughput against the alternative that throughput under V(0.2) is superior. The 24-hour runs yield a sample size of 1440 (l-minute intervals). We find that we can reject the equality hypothesis, in favor of the alternative, with a descriptive level [ll] of 0.0675. Keep in mind that the Wilcoxon test is completely free of assumptions regarding the underlying distributions. As a result, it provides a much safer test than those that depend on distributional assumptions, but at some expense in the descriptive levels. We contend that the caution is warranted. Tables III, IV, and V present the results of subsequent tests of V(0.2) against FCFS, SSTF, and SCAN respectively. Only in the test against FCFS did we find a difference in mean waiting time, where V(0.2) provided an improvement of 3.4 percent (Drive 0) and 6 percent (Drive 1). However, V(0.2) always provided a nice improvement in throughput. Even the descriptive level of 0.1475, obtained in the test against SSTF, must be declared substantial when we recall the simulation results and observe that neither of our measures (important to our system administrators) includes a direct component of waiting-time variance. Finally, we should note the consistency of performance. V(0.2) never provided a higher mean waiting time, nor a lower throughput, than its opponent.
ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

A Continuum of Disk Scheduling Algorithms

91

6. CONCLUSIONS We have presented V(R), a continuum of disk scheduling algorithms having endpoints V(0) = SSTF and V(1) = SCAN. Extensive simulations indicate that this definitional continuum also provides a performance continuum, with respect to both the mean and the variance of request waiting time. Since these two measures tend to reach optimal values near opposite endpoints of the continuum, it is not surprising that for objective functions that are linear combinations of the two, pw + kuw, intermediate points on the continuum provide performance that is uniformly superior to both SSTF and SCAN. In tests from a real system, we found V(0.2) to consistently outperform V(O.l), FCFS, SSTF, and SCAN. Is the performance improvement and the ability to adapt quickly to changing system conditions worth the additional complexity? We feel that it is, and V(0.2) is now running in our UNIX system. Nevertheless, we recognize that others may well be able to justify the opposite conclusion. Certainly the efficient implementation of V(R) presented in Section 4 renders it no more complex than SSTF (or SCAN), so the question is easily reduced to one of V(R) versus FCFS. Our simulation results and those of other studies [4,8,9] continue to indicate horrible performance for FCFS. On the other hand, our tests from a real system indicate that FCFS is merely inferior and far short of catastrophic. As with any system decision, many factors, including the system programmers time and even the cost of additional hardware, must be carefully weighed. APPENDIX Bimodal Distribution Function

where A is a normalization constant,


a2 is 0.75,

is an adjustable parameter, al is 0.25, and

Unimodal Asymmetric Distribution Function (k-stage Erlang)

h4hP e-kFx f(x) = (k - l)!


where P= 3.
REFERENCES
(Note: Reference [lo] is not cited in text.) 1. BARD, Y. Experimental evaluation of system performance. IBM Syst. J. 12, 3 (1973), 302-314. 2. COFFMAN, E., AND HOFRI, M. On the expected performance of scanning disks. SIAM J. Comput. 11 (1982), 60-70. 3. DANIEL, S., AND GEIST, R. V-SCAN: An adaptive disk scheduling algorithm. In Proceedings of the IEEE International Symposium on Computing Systems Organization (New Orleans, Mar.), 1983, pp. 96-103. 4. HOFRI, M. Disk scheduling: FCFS vs. SSTF revisited. Commun. ACM 23, 11 (Nov. 1980), 645-653. ACM Transactions on Computer Systems, Vol. 5, No. 1, February 1987.

92

R. Geist and S. Daniel

KOBAYASHI, H. Modeling and Analysis. Addison-Wesley, Reading, Mass., 1978. MILLER, R. Jackknifing variances. Ann. Math. Stat. 39,2 (1968), 567-582. SOKAL, R., AND ROHLF, F. Biometry. W.H. Freeman, San Francisco, 1981. TEOREY, T. J., AND PINKERTON, T. B. A comparative analysis of disk scheduling policies. Conmun. ACM 15,3 (Mar. 1972), 177-184. 9. TEOREY, T. Properties of disk scheduling policies in multiprogrammed computer systems. In Proceedings of the AFIPS Fall Joint Computer Conference. AFIPS Press, Reston, Va., 1972. 10. TRIVEDI, K. Analytic modeling of computer systems. Computer 11, 10 (1978), 38-56. 11. TRIVEDI, K. Probability and Statistics with Reliability, Queueing and Computer Science Applications. Prentice-Hall, Englewood Cliffs, N.J., 1982. 5. 6. 7. 8.

Received November

1984; revised July 1985 and May 1986; accepted September 1986

ACM Transactions

on Computer

Systems, Vol. 5, No. 1, February

1987.

You might also like