You are on page 1of 4

https://webservices.ignou.ac.

in/virtualcampus/adit/course/cst103/block3/unit2/in
dex.htm
K-way sort merge algorithm.
(Variant)
External Sorting
Many important sorting applications involve processing very large files, much to
o large
to fit into the primary memory of any computer.
Methods appropriate for such applications are called external methods, since the
y involve
a large amount of processing external to the central processing unit.
There are two major factors which make external algorithms quite different:
First, the cost of accessing an item is orders of magnitude greater than any boo
kkeeping
or calculating costs.
Second, over and above with this higher cost, there are severe restrictions on a
ccess,
depending on the external storage medium used: for example, items on a magnetic
tape
can be accessed only in a sequential manner
SORTING WITH DISKS
The most popular method for sorting on external storage devices is merge sort. T
his method consists of essentially two distinct phases. First, segments of the i
nput file are sorted using a good internal sort method.
K-way
K-Way Merge Algorithms or Multiway Merges are a specific type of Sequence Merge
Algorithms that specialize in taking in multiple sorted lists and merging them i
nto a single sorted list. These merge algorithms generally refer to merge algori
thms that take in a number of sorted lists greater than two. 2-Way Merges are re
ferred to as binary merges on the other hand and are also utilized in k-way merg
e algorithms
K-way merges find greater use in external sorting procedures. External sorting a
lgorithms are a class of sorting algorithms that can handle massive amounts of d
ata. External sorting is required when the data being sorted do not fit into the
main memory of a computing device (usually RAM) and instead they must reside in
the slower external memory (usually a hard drive). K-way merge algorithms usual
ly take place in the second stage of external sorting algorithms, much like they
do for merge sort.
Sort-merge process:
Do while there are still records in the data file
Do for i = 1 to k
Do for j = 1 to k
Fill in input buffer j;
Sort the records in that buffer;

enddo for j
Merge records in the buffers, by transfering to output buffer;
Write records to output file i;
enddo for i
enddo while
The output files now become input files
Merge process:
First merge pass:
do for j = 1 to k
{
do for i = 1 to k
Fill in input buffer i with records from the input file i;
enddo for i
We merge the records on the input buffers, write them to
the next output buffer, and write them to output file j
*/

Remark: The process of merging is done selecting the record

with the smallest key from all the input buffers, writing it to the
next available output buffer, and removing it from the input buffer.
This process of selection continues, following these two rules:
A) When a merge output buffer is filled, it is written to the merge
output file j.
B) If a merge input buffer has been completely processed, that
merge input buffer will be ignored during the selection. If only one
merge input buffer contains records, copy them to the output buffer,
and to the file.
*/
}
enddo for j
Repeat the process until the input files had been completely read.
Next merge passes:

The files of any previous merge pass become now the input files for the next me
rge pass.
/* The files contain now sorted sequences that are k-times longer than they were
before
the merge, so we have to take that into account in the succesive merge passes.
*/
We do now a selection similar to the one on the first pass, but we modify the se
cond rule
for merging:
B' ) If a merge input buffer has been processed, that buffer is refilled using t
he next block
from the same merge input file (until the entire sorted sequence has been exhaus
ted). The
input buffer can ignored only when the whole sorted sequence has been used. When
only one merge input sequence remains, the remaining records can be directly wri
tten to
the merge output file.

A new set of the sorted sequences of records is ready to start the process agai
n. We use
the next output file to write the results of sorting the new sorted sequences.

Merge termination:

The process ends when a single sorted sequence, that is as long as the data file
,
has been generated. It will be the result of outputting to a single file.

Sorting with Tapes


*********************
Characteristics
Processing large files, unable to fit into the main memory
Restrictions on the access, depending on the external storage medium
Primary costs
for input-output
Main concern: minimize the number of times each piece of data is moved
between the external storage and the main memory.

General strategy - Sort-Merge


Break the file into blocks about the size of the internal memory
Sort these blocks
Merge sorted blocks
Usually several passes are needed, creating larger sorted blocks
until the whole file is sorted
Basic Algorithm
Assumptions:
four tapes:
two for input - Ta1, Ta2,
two for output - Tb1, Tb2.
Initially the file is on Ta1.
N records on Ta1
M records can fit in the memory
Step 1: Break the file into blocks of size M, [N/M]+1 blocks
Step 2: Sorting the blocks:
read
read
read
etc,
Each
Each

a block, sort, store on Tb1


a block, sort, store on Tb2,
a block, sort, store on Tb1,
alternatively writing on Tb1 and Tb2
sorted block is called a run.
output tape will contain half of the runs

Step 3: Merge:
From Tb1, Tb2 to Ta1, Ta2.
Merge the first run on Tb1 and the first run on Tb2, and store the result on Ta1
:
Read two records in main memory, compare, store the smaller on Ta1
Read the next record (from Tb1 or Tb2 - the tape that contained the record
stored on Ta1) compare, store on Ta1, etc.
Merge the second run on Tb1 and the second run on Tb2, store the result on Ta2.
Merge the third run on Tb1 and the third run on Tb2, store the result on Ta1.
Etc, storing the result alternatively on Ta1 and Ta2.
Now Ta1 and Ta2 will contain sorted runs twice the size of the previous runs on
Tb1 and Tb2
From Ta1, Ta2 to Tb1, Tb2.
Merge the first run on Ta1 and the first run on Ta2, and store the result on Tb1
.
Merge the second run on Ta1 and the second run on Ta2, store the result on Tb2
Etc, merge and store alternatively on Ta1 and Ta2.
Repeat the process until only one run is obtained. This would be the sorted file
symbol table
*"""""""""""""""
A symbol table is a data structure for storing a list of
items

You might also like