Professional Documents
Culture Documents
in/virtualcampus/adit/course/cst103/block3/unit2/in
dex.htm
K-way sort merge algorithm.
(Variant)
External Sorting
Many important sorting applications involve processing very large files, much to
o large
to fit into the primary memory of any computer.
Methods appropriate for such applications are called external methods, since the
y involve
a large amount of processing external to the central processing unit.
There are two major factors which make external algorithms quite different:
First, the cost of accessing an item is orders of magnitude greater than any boo
kkeeping
or calculating costs.
Second, over and above with this higher cost, there are severe restrictions on a
ccess,
depending on the external storage medium used: for example, items on a magnetic
tape
can be accessed only in a sequential manner
SORTING WITH DISKS
The most popular method for sorting on external storage devices is merge sort. T
his method consists of essentially two distinct phases. First, segments of the i
nput file are sorted using a good internal sort method.
K-way
K-Way Merge Algorithms or Multiway Merges are a specific type of Sequence Merge
Algorithms that specialize in taking in multiple sorted lists and merging them i
nto a single sorted list. These merge algorithms generally refer to merge algori
thms that take in a number of sorted lists greater than two. 2-Way Merges are re
ferred to as binary merges on the other hand and are also utilized in k-way merg
e algorithms
K-way merges find greater use in external sorting procedures. External sorting a
lgorithms are a class of sorting algorithms that can handle massive amounts of d
ata. External sorting is required when the data being sorted do not fit into the
main memory of a computing device (usually RAM) and instead they must reside in
the slower external memory (usually a hard drive). K-way merge algorithms usual
ly take place in the second stage of external sorting algorithms, much like they
do for merge sort.
Sort-merge process:
Do while there are still records in the data file
Do for i = 1 to k
Do for j = 1 to k
Fill in input buffer j;
Sort the records in that buffer;
enddo for j
Merge records in the buffers, by transfering to output buffer;
Write records to output file i;
enddo for i
enddo while
The output files now become input files
Merge process:
First merge pass:
do for j = 1 to k
{
do for i = 1 to k
Fill in input buffer i with records from the input file i;
enddo for i
We merge the records on the input buffers, write them to
the next output buffer, and write them to output file j
*/
with the smallest key from all the input buffers, writing it to the
next available output buffer, and removing it from the input buffer.
This process of selection continues, following these two rules:
A) When a merge output buffer is filled, it is written to the merge
output file j.
B) If a merge input buffer has been completely processed, that
merge input buffer will be ignored during the selection. If only one
merge input buffer contains records, copy them to the output buffer,
and to the file.
*/
}
enddo for j
Repeat the process until the input files had been completely read.
Next merge passes:
The files of any previous merge pass become now the input files for the next me
rge pass.
/* The files contain now sorted sequences that are k-times longer than they were
before
the merge, so we have to take that into account in the succesive merge passes.
*/
We do now a selection similar to the one on the first pass, but we modify the se
cond rule
for merging:
B' ) If a merge input buffer has been processed, that buffer is refilled using t
he next block
from the same merge input file (until the entire sorted sequence has been exhaus
ted). The
input buffer can ignored only when the whole sorted sequence has been used. When
only one merge input sequence remains, the remaining records can be directly wri
tten to
the merge output file.
A new set of the sorted sequences of records is ready to start the process agai
n. We use
the next output file to write the results of sorting the new sorted sequences.
Merge termination:
The process ends when a single sorted sequence, that is as long as the data file
,
has been generated. It will be the result of outputting to a single file.
Step 3: Merge:
From Tb1, Tb2 to Ta1, Ta2.
Merge the first run on Tb1 and the first run on Tb2, and store the result on Ta1
:
Read two records in main memory, compare, store the smaller on Ta1
Read the next record (from Tb1 or Tb2 - the tape that contained the record
stored on Ta1) compare, store on Ta1, etc.
Merge the second run on Tb1 and the second run on Tb2, store the result on Ta2.
Merge the third run on Tb1 and the third run on Tb2, store the result on Ta1.
Etc, storing the result alternatively on Ta1 and Ta2.
Now Ta1 and Ta2 will contain sorted runs twice the size of the previous runs on
Tb1 and Tb2
From Ta1, Ta2 to Tb1, Tb2.
Merge the first run on Ta1 and the first run on Ta2, and store the result on Tb1
.
Merge the second run on Ta1 and the second run on Ta2, store the result on Tb2
Etc, merge and store alternatively on Ta1 and Ta2.
Repeat the process until only one run is obtained. This would be the sorted file
symbol table
*"""""""""""""""
A symbol table is a data structure for storing a list of
items