9 views

Uploaded by Anirban Ray

Quick sort is a very fast algorithm for sorting, but since it is recursive, it takes comparably same time as insertion sort for "small" arrays, as insertion sort being iterative and operating by switching elements, works better for small arrays which are "almost sorted". Thus, we design a hybrid algorithm which partitions the to be sorted array similar to the quick sort recursively until a optimum cutoff size, determined by simulation study, is reached and then applies the insertion sort algorithm.

- 0
- All Sorting Methods
- Array1
- Date
- Ioi Syllabus
- Improving the Performance of Quicksort for Average Case Through a Modified Diminishing Increment Sorting
- Parallel Sorting
- Lecture9.pdf
- lpvii-1
- Datastruct Final Study Guide
- RECURSION
- Lecture 10: Polymorphism
- EC_6312_OOPS & DATA_STRUCTURES_LAB_-labmanual-FINAL (1).docx
- Px c 3883363
- Professional Resume
- Bca Full Syllabus Idolgu
- Usama Bin Hafeez 35
- English Walk-And-Talk Installation and Operation Guide 370-0196-05
- AOA
- Microsoft Interview Questions

You are on page 1of 9

Runtime Comparison

Anirban Ray

23 February 2018

Objective

Among numerous sorting algorithms, some of the common algorithms are Quick Sort and Insertion Sort.

Quick sort is very popular since it is the fastest known general sorting algorithm in practice which provides

best run-time in average cases. Insertion sort, on the other hand, works very well when the array is partially

sorted and also when the array size is not too large. In this project, we will try to combine these two algorithms

in such a way that we can use both the speed of quick sort and also the benefit of effectiveness of insertion

sort. Afterwards, we would like to find hybrid algorithm (combination of insertion and quick), which is optimum

in the sense of minimum average run-time.

Insertion Sort

Insertion sort is an iterative sorting algorithm. The main idea of this is that at each iteration, insertion sort

removes an element, find its ordered position in the sorted array of the previous elements and inserts it

there. The algorithm can be written as below:

INSERTIONSORT(A)

for j = 2 to A.length

key = A[j]

i = j - 1

while i > 0 and A[i] > key

A[i + 1] = A[i]

i = i - 1

A[i + 1] = key

Quick Sort

Quick sort is a divide and conquer algorithm. It first divides a large array into two sub-arrays with respect to a

pivot element, where all elements of one sub-array is not more than the pivot element, and those of the other

are not less than that. Then it does the same for the two sub-arrays and continue to do so until a stage is

reached where all sub-arrays are of size 1. Since all these sub-arrays are now sorted trivially, merging these

will result in completion of the sorting process. The algorithm to sort the pth to rth of the array A is as follows.

QUICKSORT(A, p, r)

if p < r

q = PARTITION(A, p, r)

QUICKSORT(A, p, q)

QUICKSORT(A, q + 1, r)

PARTITION(A, p, r)

x = A[p]

i = p - 1

j = r + 1

while TRUE

repeat

j = j - 1

until A[j] <= x

repeat

i = i + 1

until A[i] >= x

if (i < j) exchange A[i] with A[j]

else return j

Different choices of the pivot element are available for different types of the input array. In the above-

mentioned algorithm, we have used the first element of the array. Lomuto used the last element of the array.

Sometimes a random index is chosen and swapped with the last element and then the Lomuto partitioning

method is followed. Singleton used the median of three method, where one first sort the first, last and middle-

most elements of the array, and then exchange the middle most element of the modified array with the first

element of the array and proceed as before. In this project, we will always use random inputs, in which case

the choice of pivot does not matter too much. So, we will continue to use the first element as pivot following

Hoare, the first proposer of the quick sort algorithm.

Hybrid Sort

Now we come to the formulation of the new hybrid algorithm. Since we know that insertion sort works better

for arrays with partially sorted sub-arrays of small size, we start the sorting procedure by the partition

approach of quick sort algorithm. But instead of continuing until we reach sub-arrays of one element each,

we stop partitioning when we reach the stage of sub-arrays of size less than some given cut-off size, which

distinguishes between the small and large arrays. After this step gets completed, we have an array

constituting of sub-arrays of sizes less than or equal to the cut-off size, which are not sorted themselves, but

as a whole, they are sorted. Finally, we run insertion sort over the entire array to get the completely sorted

output. The algorithm is the following.

HYBRIDSORT(A, p, r, k)

if (p < r)

if (r - p + 1 > k)

q = PARTITION(A, p, r)

HYBRIDSORT(A, p, q, k)

HYBRIDSORT(A, q + 1, r, k)

INSERTIONSORT(A)

We first define the sorting algorithms in C++ using the Rcpp package.

#include <Rcpp.h>

using namespace Rcpp;

void swap(NumericVector array, int first_position, int second_position) {

double temporary = array[first_position];

array[first_position] = array[second_position];

array[second_position] = temporary;

}

int partition(NumericVector array, int start, int end) {

double pivot = array[start];

int i = (start - 1);

int j = (end + 1);

while(TRUE) {

do {

i = (i + 1);

} while (array[i] < pivot);

do {

j = (j - 1);

} while (array[j] > pivot);

if (i >= j) {

return j;

}

swap(array, i, j);

}

}

void insertion(NumericVector array, int start, int end) {

if (start < end) {

for (int i = (start + 1); i <= end; ++i) {

double temporary = array[i];

int j = (i - 1);

while ((j >= start) && (array[j] > temporary)) {

array[(j + 1)] = array[j];

j = (j - 1);

}

array[(j + 1)] = temporary;

}

}

}

void quick(NumericVector array, int start, int end) {

if (start < end) {

int key = partition(array, start, end);

quick(array, start, key);

quick(array, (key + 1), end);

}

}

void hybrid(NumericVector array, int start, int end, int cutoff) {

if (start < end) {

// applying partition algorithm only when array size is more than cutoff

if ((end - start + 1) > cutoff) {

int key = partition(array, start, end);

hybrid(array, start, key, cutoff);

hybrid(array, (key + 1), end, cutoff);

}

}

}

// [[Rcpp::export]]

NumericVector sorting_R(NumericVector array, char method, int cutoff) {

int n = array.length();

// making an explicit copy of the input array to keep that unchanged

NumericVector sorted_array = clone(array);

// applying different sorting algorithms based on method

switch (method) {

case 'h': {

hybrid(sorted_array, 0, (n - 1), cutoff);

insertion(sorted_array, 0, (n - 1));

break;

}

case 'i': {

insertion(sorted_array, 0, (n - 1));

break;

}

case 'q': {

quick(sorted_array, 0, (n - 1));

break;

}

default: {

Rcpp::stop("Permissible methods are Hybrid(h), Insertion(i) and Quick(q).");

}

}

return sorted_array;

}

Now that we have defined our sorting algorithms, in the next step, we wish to find the optimum choice for the

cut-off by simulation study, since it is not known and the concept of “small” is pretty vague. Therefore, we

define functions in R (by calling the C++ functions) to compute the average run-time of our hybrid algorithm

for given choice of the cut-off array size. We run these functions over different choices of cut-off sizes for

different array sizes and plot the average run-times against choices of cut-offs for different array sizes as

below.

# function to calculate required time to sort a particular input array using

# a user defined cutoff

single_hybrid_runtime <- function(array_to_be_sorted, cutoff_to_be_used) {

system.time(sorting_R(array_to_be_sorted, "h", cutoff_to_be_used))["user.self"]

}

# particular size using different choices of cutoff

comparative_hybrid_runtime <- function(array_size, cutoff) {

simulated_array <- rnorm(array_size)

sapply(cutoff, single_hybrid_runtime, array_to_be_sorted = simulated_array)

}

# function to calculate average runtime for user defined array size for

# different choices of cutoff, average being taken over different

# replications (optionally user defined)

average_hybrid_runtime <- function(array_size, cutoff, replication = 25) {

rowMeans(replicate(replication, comparative_hybrid_runtime(array_size, cutoff)))

}

keys <- seq(1, 1000, 1) # choices of cutoff used for simulation study

times_1_e_5 <- average_hybrid_runtime(array_size = 1e+05, cutoff = keys)

times_4_e_5 <- average_hybrid_runtime(array_size = 4e+05, cutoff = keys)

times_7_e_5 <- average_hybrid_runtime(array_size = 7e+05, cutoff = keys)

times_1_e_6 <- average_hybrid_runtime(array_size = 1e+06, cutoff = keys)

plot(keys, times_1_e_5, type = "o", main = "For array size 1e+05", xlab = "Cutoff Used",

ylab = "Time Taken")

plot(keys, times_4_e_5, type = "o", main = "For array size 4e+05", xlab = "Cutoff Used",

ylab = "Time Taken")

plot(keys, times_7_e_5, type = "o", main = "For array size 7e+05", xlab = "Cutoff Used",

ylab = "Time Taken")

plot(keys, times_1_e_6, type = "o", main = "For array size 1e+06", xlab = "Cutoff Used",

ylab = "Time Taken")

Observations from the Graphs

Firstly, we see that there is a sharp fall in all the graphs initially. This proves the effectiveness of the

hybrid algorithm over quick sort, as it should be noted that for the choice of cut-off as 1, we are

essentially applying quick sort over the entire array. So that steep fall helps us to conclude with

confidence that combining the two algorithms is not at all worthless. This is because of the fact that as

quick sort is a recursive algorithm, it has a too much of overhead cost for calling itself repeatedly for

small arrays.

Secondly, we note that after a certain point, average run-time has a steadily increasing trend, which is

due to the fact that insertion sort is effective only for “small” arrays. As we are increasing the cut-off

size, insertion sort needs to be applied on larger partially sorted sub-arrays and hence the sorting of

the entire array becomes slower.

Finally, we observe that the trade-off between these two opposite effects on run-time is balanced in the

lower part of the skewed U-shaped pattern, which is revealed in all the graphs, in more or less extent.

Therefore, based on the simulation study, we can conclude that the optimum choice of cut-off lies in the

range from 100 to 200. Based on our interpretation of the graph, we will subjectively choose 140 as cut-off in

the latter sections, without any analytical justification.

Now, a plausible (and of course perfectly reasonable) question will be how much do we gain from this

algorithm or do we gain at all. We have already shown in the previous section that the run-time is significantly

improved for hybrid method over quick sort. Now, we wish to see whether this improvement varies with the

size of the input array or not. For that purpose, we define to function to calculate the percentage

improvement in run-time in hybrid sort over quick sort and plot the results.

# user defined input size

single_improvement <- function(array_size) {

x <- rnorm(array_size)

hybrid_time <- system.time(sorting_R(x, "h", 140))["user.self"]

quick_time <- system.time(sorting_R(x, "q", 140))["user.self"]

(quick_time - hybrid_time) * 100/quick_time

}

average_improvement <- function(length_of_array, replication = 50) {

mean(replicate(replication, single_improvement(length_of_array)))

}

sizes <- seq(1e+05, 1e+07, 1e+05) # simulated sizes used for improvement calculation

improvement <- sapply(sizes, average_improvement)

plot(sizes, improvement, type = "o", xlab = "Array Size", ylab = "Percentage Improvement",

main = "Improvement in Hybrid algorithm over Quick")

Explanation of Improvement Pattern

From the graph, it is evident that hybrid sort always outperforms quick sort comfortably for all the array sizes.

But the same graph also reveals that the improvement is decreasing as array size increases. But one should

note that the percentage improvement is still around 40% (which is, of course, very significant for practical

purposes). The unexpected decreasing trend can be explained by the slow nature of insertion sort algorithm.

In hybrid sort, we are using insertion sort over the entire array in the last step. Although, at this step, the

array is partially sorted, it should be kept in mind the insertion sort is significantly effective only for small

arrays. We use insertion sort to minimise the large overhead cost due to recursive calls of the quick sort for

small arrays, but this remedy comes with its own cost that for large arrays, it is intrinsically slow, however

partially sorted the array may be. Thus, as array size increases, the run-time for this step also increases.

Summary

At the end the project, we see that we have successfully improved the quick sort by combining insertion sort

with it. We have also provided an interval where the optimum choice of cut-off size should lie. We have also

verified the consistent out-performance of hybrid sort over quick sort. Thus, we can use this algorithm as an

alternative for the quick sort algorithm.

References

1. Introduction to Algorithms - Third Edition (https://mitpress.mit.edu/books/introduction-algorithms)

2. Wikipedia - Quick Sort (https://en.wikipedia.org/wiki/Quicksort)

3. Wikipedia - Insertion Sort (https://en.wikipedia.org/wiki/Insertion_sort)

4. Techie Delight - Hybrid QuickSort Algorithm (www.techiedelight.com/hybrid-quicksort)

- 0Uploaded bymvdurgadevi
- All Sorting MethodsUploaded byPradeepa Kannan
- Array1Uploaded bySourav Gayen
- DateUploaded byMadhusudhana Reddy
- Ioi SyllabusUploaded bySamin Afnan
- Improving the Performance of Quicksort for Average Case Through a Modified Diminishing Increment SortingUploaded byJournal of Computing
- Parallel SortingUploaded byAmy Latifah
- Lecture9.pdfUploaded bySon T. Nguyen
- lpvii-1Uploaded bySoumodip Chakraborty
- Datastruct Final Study GuideUploaded byshivambarca
- RECURSIONUploaded byakukurt
- Lecture 10: PolymorphismUploaded byZhiLiu
- EC_6312_OOPS & DATA_STRUCTURES_LAB_-labmanual-FINAL (1).docxUploaded byRajith Rathinaswamy
- Px c 3883363Uploaded byWidya Utami
- Professional ResumeUploaded bykapilkumargupta
- Bca Full Syllabus IdolguUploaded byMonoj Das
- Usama Bin Hafeez 35Uploaded byUsama Abbasi
- English Walk-And-Talk Installation and Operation Guide 370-0196-05Uploaded byraiperei
- AOAUploaded byRagini Karuna
- Microsoft Interview QuestionsUploaded byapi-26403048
- MOM for Meeting 'Alarms Missing in TTI' With TAC, Operations Support, Front Office, Server SupportUploaded byPetson Chirangara
- SLog_20121221Uploaded byHenry Alejandro Devia Medina
- Advanced Ips TocUploaded byNguyen Nhim
- libgdxGPSUploaded byDanut-Valentin Dinu
- System Shut Down Power Up Sequence Rev BUploaded byejpcguitar
- SAP Press - SAP Transaction Codes - Quick Reference 285.pdfUploaded bysunnyshastri6866
- Px EngineerUploaded byjohovitch
- Remove Special CharsUploaded bydamo9
- 7835Uploaded byCrisanto Villa Abia
- 1Uploaded bySupriya Kumari

- Non-Parametric Tests for One Sample Location ProblemUploaded byAnirban Ray
- Hybrid Quick Sort + Insertion Sort: Runtime ComparisonUploaded byAnirban Ray
- Analysis of Hydrocarbon Data - Application of LASSO RegressionUploaded byAnirban Ray
- PR Mini Project 2018Uploaded byAnirban Ray
- Analysis of Fishing Data – Application of Count RegressionUploaded byAnirban Ray
- Analysis of Fishing Data – Application of Count RegressionUploaded byAnirban Ray
- Analysis of Hydrocarbon Data - Application of LASSO RegressionUploaded byAnirban Ray

- History of AzamgarhUploaded bykhurshidazmi
- Zhu Xi's Spiritual Practice.txtUploaded byCarolyn Hardy
- Market Segmentation and Brands of ItcUploaded byshaishavdabhi
- Egypt Packet Review AnswersUploaded byaemagaro
- Sam PaUploaded bybartlechestnut
- AppendicesUploaded bygogetterace
- APC AP9225 User GuideUploaded byForrest Taylor
- Research ProposalUploaded bySteven Scott
- 1_4909182623311462535.pdfUploaded byAnonymous J8kL8eV
- Makerere VC Candidate Presentation Prof Barnabas Nawangwe 15thJune2017[1]Uploaded byThe Campus Times
- mh105Uploaded byFatima Mehdi
- [] Design Procedure for Loading Capacity Calculations for Classic Automobile DifferentialsUploaded bySergio Chavez
- WageUploaded byDona Marie Matundan
- 590 Series DC DrivesUploaded byOm Ndut
- Legal-Ethics-Case-Digest.docxUploaded byVinz G. Viz
- The Javascript Handbook 2015.pdfUploaded bylullaby8
- RICEFW’s in SAP Projects and Role of Functional Consultant _ SAP BlogsUploaded bysidharth kumar
- graphic organizer lesson - braden pelly 6Uploaded byapi-430800269
- Traverse Computations MLSUploaded byEliza_Sinta_2891
- Rajiv Malik, Plaintiff-Appellee-Cross-Appellant v. Carrier Corp., Defendant-Appellant-Cross-Appellee, Regina Kramer, Defendant-Cross-Appellee, 202 F.3d 97, 2d Cir. (2000)Uploaded byScribd Government Docs
- Measuring and Mitigating the Costs of StockoutsUploaded byCharlie Kebaso
- Three-Dimensional Crane Modelling and ControlUploaded byAna Musra
- Sensory ArchitectureUploaded byAreeba Komal Qazi
- Thermodynamic SystemUploaded byDanilo Olaya
- e r Uploaded byshabbirqau
- Sale of Goods LawUploaded byRahul Vaishnav
- Architecture and Political LegitimationUploaded bymarchein
- mpioUploaded bytechgig
- Cultural.barriers.to.InterviewingUploaded byClaudio Lima
- "On Experiencing Gore Vidal"Uploaded byJesse Walker