You are on page 1of 33

Closest Pair

Presented By5/26/12

Alekh Dwivedi(11210

Introduction

The Closest Pair problem consists of finding a pair of pointspandqfrom a set ofnpoints such that thepandqare at a minimum distance from each other. The brute force solution to this problem takes O(n2) comparisons to check the distance between each possible pair of points. 5/26/12

Closest pair

Input : set of point in the plane. Output : closest pair of point. We can compute all the distance among these point given in the plane each point has a X,Y coordinate. We find a closest among these . Find the distance among these point by Euclidean distance. Distance { (x1,y1), (x2,y2) } 5/26/12 = Square root of ( (x1 x2 )^2 + (y1

Naive approach

The question is can you do faster? Can you do better?


Nave method: Brute force.

Check all pair of points p and q with (n2) comparisons. There are n-1 distance and your recurrence will lead to O(n^2) solution. I took n-1 extra time but suppose i can do much faster than you are in good shape.

5/26/12

1- Dimension problem

5/26/12

1- D Divide & conquer

5/26/12

1- D Divide & conquer

5/26/12

1- D Divide & conquer


Notice that if the closest pair has one point contributed from each subsetS1andS2, thenbothof these points must be withindof the midpointm.

5/26/12

Introduction of 2-D
The brute force approach to the closest pair problem (i.e. checking every possible pair of points) takes quadratic time. We would now like to introduce a faster divide-and-conquer algorithm for solving the closest pair problem. Given a set of points in the planeS, our approach will be to split the set into two roughly equal halves (S1andS2) for which we already have the solutions, and then to merge the halves in linear time to yield an O(nlogn) algorithm. However, the actual solution is far from obvious. It is possible that the the desired pair might have one point inS1and one inS2, does this not force us once again to check all possible pairs of points? The divide-andconquer approach presented here generalizes directly from the one dimensional algorithm we presented in the previous section.

5/26/12

Closest pair in the plane

Up to now, we are completely in step with the 1-D case. At this point, however, the extra dimension causes some problems. We wish to determine if some point in sayP1is less thandaway from another point inP2. However, in the plane, we don't have the luxury that we had on the line when we observed that only one point in each set can be withindof the median. In fact, in two dimensions,all of the points could be in the strip! This is disastrous, because we would have to 5/26/12 comparen2pairs of points to merge the set, and hence our divide-and-

Divide & conquer in 2-D


those points withindofpin the direction of the other strip those withindofpin the positive and negativeydirections

5/26/12

Divide & conquer in 2-D


For a point p in P1, which portion of P2 should be checked? We only need to check the points that are within of p. Thus we can limit the portion of P2. The points to consider for a point p must lie within 2 rectangle R. At most, how many points are there in rectangle R?

S 1

S 2

5/26/12

continue
However, since we sorted the points in the strip by theirycoordinates the process of merging our two subsets isnotlinear, but in fact takes O(nlogn) time

Hence our full algorithm is not yet O(nlogn), but it is still an improvement on the quadratic performance of the brute force approach (as we shall see in the next section)., we will demonstrate how to make this algorithm even more efficient by strengthening our recursive sub-solution.

5/26/12

Summary and analysis of 2D algo


We present here a step by step summary of the algorithm presented in the previous section, followed by a performance analysis. The algorithm is simply written in list form because I find pseudo-code to be burdensome and unnecessary when trying to understand an algorithm. Note that we pre-sort the points according to theirxcoordinates which in itself takes O(nlogn) time. Closest Pairof a set of points: 1.Divide the set into two equal sized parts by the linel, and recursively compute the minimal distance in each part. 2.Letdbe the minimal of the two minimal distances. 3.Eliminate points that lie farther thandapart froml 4.Sort the remaining points according to theiry-coordinates 5.Scan the remaining points in theyorder and compute the distances of each point to its five neighbours. 6.If any of these distances is less thandthen updated. 5/26/12

Summary and analysis of 2D algo


Steps 2-6 define the merging process which must be repeated logntimes because this is a divide and conquer algorithm: 1.Step 2 takes O(1) time 2.Step 3 takes O(n) time 3.Step 4 is a sort that takes O(nlogn) time 4.Step 5 takes O(n) time (as we saw in the previous section) 5.Step 6 takes O(1) time Hence the merging of the sub-solutions is dominated by the sorting at step 4, and hence takes O(nlogn) time. This must be repeated once for each level of recursion in the divideand-conquer algorithm, hence the whole of algorithmClosest Pairtakes O(logn*nlogn) = O(nlog2n) time.

5/26/12

Improving the algorithm


We can improve on this algorithm slightly by reducing the time it takes to achieve they-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by theirycoordinates. This will yield two sorted lists of points which need only bemerged(a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes: Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order byycoordinate. Step 4: Merge the two sorted lists into one sorted list in O(n) time. Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane. 5/26/12

Divide and conquer

Divide : into two subsets (according to xcoordinate) : PL<=l <=PR (O(n)) Conquer: recursively on each half.

Get L, R 2T(n/2).

Combine:

select closer pair of the above. =min{L, R), O(1) Find the smaller pairs, onePL and the otherPR

Creat an array Y of points within 2vertical strip, sorted by y-coor. O(nlgn) or O(n). O(7n).

5/26/12for each point in Y, compare it with its following 7 points.

Sort points according to coordinates

Sort points by x-coordinates will simplify the division. Sorting by y-coordinates will simplify the computation of distances of cross points. sorting in recursive function will not work

O(nlg n), so total cost: O(n lg2n)

Instead, pre-sort all the points, then the 5/26/12 half division will keep the points

CLOSEST_PAIR(P, X, Y)

P: set of points, X: sorted by x-coordinate, Y: sorted by y-coordinate Divide P into PL and PR, X into XL and XR, Y into YL and YR,

1=CLOSET-PAIR(PL,XL,YL) and 2=CLOSET-PAIR(PR,XR,YR) =min(1, 2)

//T(n/2)

//T(n/2)

Combine:

compute the minimum distance between the points plPL and pr PR . // O(n).

Form Y, which is the points of Y within 2-wide vertical strip. For each point p in Y, 7 following points for

5/26/12

In summary

T(n)=- O(1), if n<=3.

2T(n/2)+O(nlgn) O(nlg2n)

T(n)=O(nlgn)+T(n)

T(n)=2T(n/2)+O(n) So O(nlgn)+O(nlgn)=O(nlgn).

Improvements: Comparing 5 not 7? Does not pre-sort Y? Different distance definition? Three dimensions?

5/26/12

A Plane Sweep Algorithm

Introduction-

we are given a set S of n points in the plane, but this time we shall attempt to use the plane sweep technique. We sweep a vertical line across the set from left to right keeping track of the closest pair seen so far. We shall describe an O(nlogn) algorithm.
5/26/12

Algorithm and Performance

The algorithmic technique we shall use is the plane sweep method. This means that we will be sweeping a vertical line across the set of points, keeping track of certain data, and performing certain actions every time a point is encountered during the plane sweep. As we sweep the line, we will maintain the following data:
5/26/12

Diagram for Plane Sweep algo-

Plane sweep technique for closest pair. Sweep line 5/26/12 shown in red

ContdEvery time the sweep line encounters a point p, we will perform the following actions

Remove the points further than d to the left of p from the ordered set D that is storing the points in the strip. Determine the point on the left of p that is closest to it

5/26/12

Contd-

5/26/12

Restriction of search to small bounding box shown in light blue.

summary and analysis

sort the set according to xcoordinates (this is necessary for a plane sweep approach to determine the coordinates of the next point), which takes O(nlogn) time. inserting and removing each point once from the ordered set D (each insertion takes O(logn) time . comparing each point to a constant 5/26/12 number of its neighbors (which takes

Proof of Correctness

Let {p1,...,pn} be the set of input points sorted by their x-coordinates. When the sweep line hits p2, then the pair (p1,p2) will be the current closest pair with distance d=dist(p1,p2). Furthermore, we know that if p1 is one of the points that makes up the closest pair for the whole set, then the other point must be p2, since no other points in the 5/26/12 set are closer to p1.

Contd

There are two cases for the position of p. [CASE 1]: p is in the strip - In this
case, p would have been within the bounding box checked by the algorithm (since it is within less than d from q as per our assumption), and hence would not have been missed.

[CASE 2]: p is not in the strip - in this case p must be to the left of the strip 5/26/12 (since we assumed that it is to the left of

Contd-

5/26/12

Contd

Now, if we include the current point pi, the algorithm will check and see if there is a point to the left of the sweep line within d of it and update accordingly. Hence when the sweep completes processing at a given point pi, then d is the distance between the closest pair among the points p1,...,pi. Applying this result to the last point in the list shows that 5/26/12 the algorithm is correct.

D-Dimensional Closest Pair Algorithm

Divide the input S into S1; S2 by the median hyper plane normal to some axis. Recursively compute d1, d2 for S1, S2. Set d = min(d1, d2). Let S be the set of points that are within d of H, projected onto H. Use the d-sparsity condition to recursively examine all pairs in S there are only O(n) pairs. 5/26/12

Contd

The recurrence for the final algorithm is:T(n, D)=O(n(log n)^D-1)

5/26/12

Application

Hierarchical clustering. Traveling salesman heuristics. Traffic Control. Dynamic minimum spanning trees.

5/26/12

You might also like