You are on page 1of 18

sign up log in tour help Search Q&A

Questions Jobs Documentation Beta Tags Users Badges Ask Question

x Dismiss

Join the Stack Overflow Community

Stack Overflow is a community of 4.7 million


programmers, just like you, helping each other.
Join them; it only takes a minute:

Sign up

Easy interview question got harder: given numbers 1..100, find the missing number(s)

asked 6 years ago

viewed 173298 times

active 1 month ago

I had an interesting job interview experience a while back. The question started really easy:
BLOG
760 Q1: We have a bag containing numbers 1 , 2 , 3 , , 100 . Each number appears exactly International salaries at Stack Overflow
once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the
missing number.
Linked
I've heard this interview question before, of course, so I very quickly answered along the lines of:
665
8
A1: Well, the sum of the numbers 1 + 2 + 3 + + N is (N+1)(N/2) (see Wikipedia: sum of Finding the number missing in the sequence
arithmetic series). For N = 100 , the sum is 5050 .
6
Thus, if all numbers are present in the bag, the sum will be exactly 5050 . Since one number is Find repeating in O(n) and constant space
missing, the sum will be less than this, and the difference is that number. So we can find that
4
missing number in O(N) time and O(1) space.
Zero out 2 locations in an array of size 10000, filled
with integers from 1 to 10000. How do you find out
At this point I thought I had done well, but all of a sudden the question took an unexpected turn:
what those values were?

Q2: That is correct, but now how would you do this if TWO numbers are missing? 2
Need help with a math trick question
I had never seen/heard/considered this variation before, so I panicked and couldn't answer the
3
question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we
How to find two missing values in an array?
can get more information by comparing against the expected product, or perhaps doing a second
pass after having gathered some information from the first pass, etc, but I really was just shooting in 1
the dark rather than actually having a clear path to the solution.
Design an O(n) algorithm for find one number not
within range [0,n-1]
The interviewer did try to encourage me by saying that having a second equation is indeed one way
to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and 0
asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer. finding missing numbers from array using O(n) time
and O(1) space
The interviewer's answer surprised me: you can generalize the technique to find 3 missing
numbers. In fact, you can generalize it to find k missing numbers. 1
This looks similar but different from old questions.
Given an array (duplicated numbers allowed) of size
Qk: If exactly k numbers are missing from the bag, how would you find it efficiently? n, find the missing 2 numbers

This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a 0
(N) time lower bound since we must scan all the numbers at least once, but the interviewer Find 2 missing numbers in an unsorted array 1 to
insisted that the TIME and SPACE complexity of the solving technique (minus the O(N) time input 100. (Java)
scan) is defined in k not N.
1
find 3 missing numbers from a list of 97 unique
So the question here is simple:
see more linked questions
How would you solve Q2?
How would you solve Q3? Related
How would you solve Qk?
11
Interview question - Finding numbers
Clarifications
20
Generally there are N numbers from 1..N, not just 1..100.
Missing number(s) Interview Question Redux
I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the
presence/absence each number by the value of a designated bit, therefore using O(N) bits in
presence/absence each number by the value of a designated bit, therefore using O(N) bits in
additional space. We can't afford any additional space proportional to N. 585

I'm also not looking for the obvious sort-first approach. This and the set-based approach are Find an integer not among four billion given ones
worth mentioning in an interview (they are easy to implement, and depending on N, can be
very practical). I'm looking for the Holy Grail solution (which may or may not be practical to 1
implement, but has the desired asymptotic characteristics nevertheless).
This looks similar but different from old questions.
So again, of course you must scan the input in O(N) , but you can only capture small amount of Given an array (duplicated numbers allowed) of size
information (defined in terms of k not N), and must then find the k missing numbers somehow. n, find the missing 2 numbers

algorithm math
10
share improve this question edited Dec 28 '15 at 21:37 asked Aug 16 '10 at 10:26 Find two missing numbers

2947
AndyG polygenelubricants
10.8k 3 33 58 212k 75 439 554 How to pair socks from a pile efficiently?
88 Did you get the job? :) Erik B Aug 16 '10 at 12:58
274
37 @Erik: This was a screening question at the job fair, I got an invite for an on-site interview afterward.
@Dimitris: I have dreams, and regardless of whether or not they're realistic, I must work toward them.
polygenelubricants Aug 16 '10 at 13:08 Write a program to find 100 largest numbers out of
an array of 1 billion numbers
116 Panicking, getting upset and finally criticizing the worth of the question, when you realize you can't answer
it, has no doubt given your interviewer exactly the sort of insight they wanted. The reason they insisted on
knowing your thought process is because that is far more important in judging your suitability for a job than 8
getting the question right. Ash Aug 28 '10 at 6:31

4 Please read the following as the answers provided here are ridiculous: Finding the second smallest number from the given
stackoverflow.com/questions/4406110/ Matthieu N. Dec 26 '10 at 9:10 list using divide-and-conquer

6 The solution of summing the numbers requires log(N) space unless you consider the space requirement for
an unbounded integer to be O(1). But if you allow for unbounded integers, then you have as much space as 5
you want with just one integer. Udo Klein Apr 10 '13 at 5:53
Given an integer z<=10^100, find the smallest row
show 20 more comments of Pascal's triangle that contains z

40 Answers active oldest votes 3

Given an XOR and SUM of two numbers, how to


1 2 next find the number of pairs that satisfy them?

Here's a summary of Dimitris Andreou's link. Hot Network Questions


If I am fat and unattractive, is it better to opt for a
410 Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of phone interview over a Skype interview?
equations
Term for "professional" who doesn't make their
living from that kind of work
a1 + a2 + ... + ak = b1
What does 'apt-get install update' do?
a12 + a22 + ... + ak2 = b2 English equivalent of the Portuguese phrase: "this
person's mood changes according to the moon"
...
In RTL mode do icons need to be mirrored?

a1k + a2k + ... + akk = bk Are there any saltwater rivers on Earth?

Why does the ISS track appear to be sinusoidal?


Using Newton's identities, knowing bi allows to compute
What elementals can I summon with Summon
c 1 = a1 + a2 + ... ak Monster?

What Was "A Lot of Money" In 1971?


c 2 = a1a2 + a1a3 + ... + ak-1ak
Folding Numbers

... Negotiating the use of a software with my


company

c k = a1a2 ... ak A handheld microwave gun: Is it feasible and what


would it do?

If you expand the polynomial (x-a1)...(x-ak) the coefficients will be exactly c 1, ..., c k - see Vite's Writing referee report: found major error, now
formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this what?

means ai are uniquely determined, up to permutation. What's an easy way of making my luggage unique,
so that it's easy to spot on the luggage carousel?
This ends a proof that remembering powers is enough to recover the numbers. For constant k, this Resize a table proportional to regular size (not
is a good approach. textwidth)

Is an American University Professor allowed to


However, when k is varying, the direct approach of computing c1,...,ck is prohibitely expensive, share grades with the class?
since e.g. ck is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform
ListDensityPlot of a data set in polar coordinates
computations in Zq field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate.
Numerical coincidence? Why is sum(x^(k^2)) =
The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials sum(x^((k+1/2)^2)) for x = 0.8?
is still unique. You also need an algorithm for factorization over finite fields, for example the one by
Berlekamp or Cantor-Zassenhaus. Should low frequency players anticipate in
orchestra?

High level pseudocode for constant k: Optimal sphere packings ==> Thinnest ball
coverings?
Compute i-th powers of given numbers How do computers calculate sin values?
Subtract to get sums of i-th powers of unknown numbers. Call the sums bi.
Night light, schematic and functioning
Use Newton's identities to compute coefficients from bi; call them ci. Basically, c1 = b1; c2 = Is it worth buying real estate just to safely invest
(c1b1 - b2)/2; see Wikipedia for exact formulas money?

Factor the polynomial xk-c 1x k-1 + ... + ck.


Factor the polynomial xk-c 1x k-1 + ... + ck. Is there an in-game explanation for the increase in
the number of Pokemon between generations?
The roots of the polynomial are the needed numbers a1, ..., ak.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all
numbers reduced modulo q.

As Heinrich Apfelmus commented, instead of a prime q you can use q=2log n and perform
arithmetic in finite field.

share improve this answer edited Aug 16 '10 at 14:11 answered Aug 16 '10 at 12:13

sdcvvc
18.8k 3 46 84
3 You don't have to use a prime field, you can also use q = 2^(log n) . (How did you make the super-
and subscripts?!) Heinrich Apfelmus Aug 16 '10 at 12:45

3 Also, you can calculate the c_k on the fly, without using the power sums, thanks to the formula $c^{k+1}_m =
c^k_{m+1} + c^k_m x_{k+1}$ where the superscript $k$ denotes the number of variables and $m$ the
degree of the symmetric polynomial. Heinrich Apfelmus Aug 16 '10 at 12:50

22 +1 This is really, really clever. At the same time, it's questionable, whether it's really worth the effort, or
whether (parts of) this solution to a quite artificial problem can be reused in another way. And even if this
were a real world problem, on many platforms the most trivial O(N^2) solution will probably possibly
outperform this beauty for even reasonably high N . Makes me think of this: tinyurl.com/c8fwgw
Nonetheless, great work! I wouldn't have had the patience to crawl through all the math :) back2dos Aug
16 '10 at 13:52

74 I think this is a wonderful answer. I think this also illustrates how poor of an interview question it would be to
extend the missing numbers beyond one. Even the first is kind of a gotchya, but it's common enough that it
basically shows "you did some interview prep." But to expect a CS major to know go beyond k=1 (especially
"on the spot" in an interview) is a bit silly. corsiKa Mar 25 '11 at 21:03

16 I bet entering all number in a hash set and iterating over the 1...N suite using lookups to determine if
numbers are missing, would be the most generic, fastest in average regarding k variations, most
debuggable most maintainable and understandable solution. Of course the math way is impressive but
somewhere along the way you need to be an engineer and not a mathematician. Especially when business
is involved. v.oddou Apr 3 '14 at 7:54

show 24 more comments

You will find it by reading the couple of pages of Muthukrishnan - Data Stream Algorithms: Puzzle
1: Finding Missing Numbers. It shows exactly the generalization you are looking for. Probably
186 this is what your interviewer read and why he posed these questions.

Now, if only people would start deleting the answers that are subsumed or superseded by
Muthukrishnan's treatment, and make this text easier to find. :)

Also see sdcvvc's directly related answer, which also includes pseudocode (hurray! no need to
read those tricky math formulations :)) (thanks, great work!).

share improve this answer edited Jul 15 at 1:54 answered Aug 16 '10 at 11:26

gsamaras Dimitris Andreou


20.2k 16 35 75 6,759 1 20 29

7 How do you translate that into code?!? Eldelshell Aug 16 '10 at 12:05

Oooh... That's interesting. I have to admit I got a bit confused by the maths but I was jsut skimming it. Might
leave it open to look at more later. :) And +1 to get this link more findable. ;-) Chris Aug 16 '10 at 12:21

2 The google books link doesn't work for me. Here a better version [PostScript File]. Heinrich Apfelmus Aug
16 '10 at 12:31

5 Wow. I didn't expect this to get upvoted! Last time I posted a reference to the solution (Knuth's, in that case)
instead of trying to solve it myself, it was actually downvoted: stackoverflow.com/questions/3060104/ The
librarian inside me rejoices, thanks :) Dimitris Andreou Aug 16 '10 at 12:33

2 Please read the following as the answers provided here are ridiculous: stackoverflow.com/questions/4406110/
Matthieu N. Dec 26 '10 at 9:12

show 2 more comments

We can solve Q2 by summing both the numbers themselves, and the squares of the numbers.

133 We can then reduce the problem to

k1 + k2 = x
k1^2 + k2^2 = y

Where x and y are how far the sums are below the expected values.

Substituting gives us:


Substituting gives us:

(x-k2)^2 + k2^2 = y

Which we can then solve to determine our missing numbers.

share improve this answer answered Aug 16 '10 at 10:37

Anon.
34.3k 4 56 78

4 +1; I've tried the formula in Maple for select numbers and it works. I still couldn't convince myself WHY it
works, though. polygenelubricants Aug 16 '10 at 11:12

2 @polygenelubricants: If you wanted to prove correctness, you would first show that it always provides a
correct solution (that is, it always produces a pair of numbers which, when removing them from the set, would
result in the remainder of the set having the observed sum and sum-of-squares). From there, proving
uniqueness is as simple as showing that it only produces one such pair of numbers. Anon. Aug 16 '10 at
11:50

3 The nature of the equations means that you will get two values of k2 from that equation. However, from teh
first equation that you use to generate k1 you can see that these two values of k2 will mean that k1 is the
other value so you have two solutions that are the same numbers the opposite way around. If you abitrarily
declared that k1>k2 then you'd only have one solution to the quadratic equation and thus one solution overall.
And clearly by the nature of the question an answer always exists so it always works. Chris Aug 16 '10 at
12:06

3 For a given sum k1+k2, there are many pairs. We can write these pairs as K1=a+b and K2 = a-b where a =
(K1+k2/2). a is unique for a given sum. The sum of the squares (a+b)**2 + (a-b)**2 = 2*(a2 + b2). For a given
sum K1+K2, the a2 term is fixed and we see that the sum of the squares will be unique due to the b 2
term. Therefore, the values x and y are unique for a pair of integers. phkahler Aug 16 '10 at 14:31

3 This is awesome. @user3281743 here's an example. Let the missing numbers (k1 and k2) be 4 and 6. Sum(1
-> 10) = 55 and Sum(1^2 -> 10^2) = 385. Now let x = 55 - (Sum(All remaining numbers)) and y = 385 -
(Sum(Squares of all remaining numbers)) thus x = 10 and y = 52. Substitute as shown which leaves us with:
(10 - k2)^2 + k2^2 = 52 which you can simplify to: 2k^2 - 20k + 48 = 0. Solving the quadratic equation gives
you 4 and 6 as the answer. AlexKoren Oct 12 '15 at 2:07

show 3 more comments

As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1)
space, and an adaptation of my answer there works here too.
98
Assuming that the "bag" is represented by a 1-based array A[] of size N - k , we can solve Qk in
O(N) time and O(k) additional space.

First, we extend our array A[] by k elements, so that it is now of size N . This is the O(k)
additional space. We then run the following pseudo-code algorithm:

for i := n - k + 1 to n
A[i] := A[1]
end for

for i := 1 to n - k
while A[A[i]] != A[i]
swap(A[i], A[A[i]])
end while
end for

for i := 1 to n
if A[i] != i then
print i
end if
end for

The first loop initialises the k extra entries to the same as the first entry in the array (this is just a
convenient value that we know is already present in the array - after this step, any entries that were
missing in the initial array of size N-k are still missing in the extended array).

The second loop permutes the extended array so that if element x is present at least once, then
one of those entries will be at position A[x] .

Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an
i such that A[i] != i , and each swap sets at least one element such that A[i] == i , where
that wasn't true before. This means that the total number of swaps (and thus the total number of
executions of the while loop body) is at most N-1 .

The third loop prints those indexes of the array i that are not occupied by the value i - this
means that i must have been missing.

share improve this answer edited Dec 20 '14 at 4:13 answered Apr 22 '11 at 4:32

Pavan Manjunath caf


14.6k 5 59 92 155k 14 192 324

2 I wonder why so few people vote this answer up and even did not mark it as a correct answer. Here is the
code in Python. It runs in O(n) time and need extra space O(k). pastebin.com/9jZqnTzV wall-e Oct 22 '12 at
4:03

1 @caf this is quite similar to setting the bits and counting the places where the bit is 0. And I think as you are
1 @caf this is quite similar to setting the bits and counting the places where the bit is 0. And I think as you are
creating an integer array more memory is occupied. Fox Apr 22 '13 at 6:41

2 "Setting the bits and counting the places where the bit is 0" requires O(n) extra space, this solution shows how
to use O(k) extra space. caf Dec 12 '13 at 23:19

3 Doesn't work with streams as input and modifies the input array (though I like it very much and the idea is
fruitful). comco Jan 30 '14 at 14:07

2 @v.oddou: Nope, it's fine. The swap will change A[i] , which means that the next iteration won't be
comparing the same two values as the previous one. The new A[i] will be the same as the last loop's
A[A[i]] , but the new A[A[i]] will be a new value. Try it and see. caf Apr 3 '14 at 10:55

show 7 more comments

I asked a 4-year-old to solve this problem. He sorted the numbers and then counted along. This has
a space requirement of O(kitchen floor), and it works just as easy however many balls are missing.
77
share improve this answer edited May 29 '14 at 12:53 answered Apr 12 '13 at 18:59

Peter Mortensen Colonel Panic


10.2k 13 69 107 52.4k 33 219 275

8 ;) your 4 year old must be approaching 5 or/and is a genius. my 4 year old daughter cannot even count

properly to 4 yet. well to be fair let's say she just barely finally integrated the "4"'s existence. otherwise until
now she would always skip it. "1,2,3,5,6,7" was her usual counting sequence. I asked her to add pencils
together and she would manage 1+2=3 by denumbering all again from scratch. I'm worried actually... :'( meh..
v.oddou Apr 3 '14 at 8:07
This does not provide an answer to the question. To critique or request clarification from an author, leave a
comment below their post. hiro protagonist Sep 17 '15 at 8:36

simple yet effective approach. PabTorre Oct 3 '15 at 16:01

1 O(kitchen floor) haha - but wouldn't that be O(n^2) ? user3235832 Jul 9 at 15:42

I haven't checked the maths, but I suspect that computing (n^2) in the same pass as we
compute (n) would provide enough info to get two missing numbers, Do (n^3) as well if there
24 are three, and so on.

share improve this answer answered Aug 16 '10 at 10:38

AakashM
43.1k 9 98 155

Not sure, if it's the most efficient solution, but I would loop over all entries, and use a bitset to
remember, which numbers are set, and then test for 0 bits.
18
I like simple solutions - and I even believe, that it might be faster than calculating the sum, or the
sum of squares etc.

share improve this answer answered Aug 16 '10 at 10:38

Chris Lercher
26.8k 10 76 111

7 I did propose this obvious answer, but this is not what the interviewer wanted. I explicitly said in the question
that this is not the answer I'm looking for. Another obvious answer: sort first. Neither the O(N) counting sort
nor O(N log N) comparison sort is what I'm looking for, although they are both very simple solutions.
polygenelubricants Aug 16 '10 at 11:14

@polygenelubricants: I can't find where you said that in your question. If you consider the bitset to be the
result, then there is no second pass. The complexity is (if we consider N to be constant, as the interviewer
suggests by saying, that the complexity is "defined in k not N") O(1), and if you need to construct a more
"clean" result, you get O(k), which is the best you can get, because you always need O(k) to create the clean
result. Chris Lercher Aug 16 '10 at 11:20

"Note that I'm not looking for the obvious set-based solution (e.g. using a bit set,". The second last paragraph
from the original question. hrnt Aug 16 '10 at 11:24

3 @hmt: Yes, the question was edited a few minutes ago. I'm just giving the answer, that I would expect from
an interviewee... Artificially constructing a sub-optimal solution (you can't beat O(n) + O(k) time, no matter
what you do) doesn't make sense to me - except if you can't afford O(n) additional space, but the question
isn't explicit on that. Chris Lercher Aug 16 '10 at 11:30

2 I've edited the question again to further clarify. I do appreciate the feedback/answer. polygenelubricants
Aug 16 '10 at 11:42

show 1 more comment

Wait a minute. As the question is stated, there are 100 numbers in the bag. No matter how big k is,
the problem can be solved in constant time because you can use a set and remove numbers from
9 the set in at most 100 - k iterations of a loop. 100 is constant. The set of remaining numbers is your
9 the set in at most 100 - k iterations of a loop. 100 is constant. The set of remaining numbers is your
answer.

If we generalise the solution to the numbers from 1 to N, nothing changes except N is not a
constant, so we are in O(N - k) = O(N) time. For instance, if we use a bit set, we set the bits to 1 in
O(N) time, iterate through the numbers, setting the bits to 0 as we go (O(N-k) = O(N)) and then we
have the answer.

It seems to me that the interviewer was asking you how toprint out the contents of the final set in
O(k) time rather than O(N) time. Clearly, with a bit set, you have to iterate through all N bits to
determine whether you should print the number or not. However, if you change the way the set is
implemented you can print out the numbers in k iterations. This is done by putting the numbers into
an object to be stored in both a hash set and a doubly linked list. When you remove an object from
the hash set, you also remove it from the list. The answers will be left in the list which is now of
length k.

share improve this answer answered Aug 16 '10 at 11:25

JeremyP
59.9k 7 85 122

4 This answer is too simple, and we all know that simple answers don't work! ;) Seriously though, original
question should probably emphasize O(k) space requirement. DK. Sep 2 '10 at 20:48

The problem is not that is simple but that you'll have to use O(n) additional memory for the map. The problem
bust me solved in constant time and constant memory Mojo Risin Mar 14 '11 at 14:58

2 I bet you can prove the minimal solution is at least O(N). because less, would mean that you didn't even
LOOK at some numbers, and since there is no ordering specified, looking at ALL numbers is mandatory.
v.oddou Apr 3 '14 at 8:12

If we look at the input as a stream, and n is too large to keep in memory, the O(k) memory requirement makes
sense. We can still use hashing though: Just make k^2 buckets and use the simple sum algorithm on each of
them. That's only k^2 memory and a few more buckets can be used to get high probability of success.
Thomas Ahle Apr 26 at 13:31

The problem with solutions based on sums of numbers is they don't take into account the cost of
storing and working with numbers with large exponents... in practice, for it to work for very large n,
8 a big numbers library would be used. We can analyse the space utilisation for these algorithms.

We can analyse the time and space complexity of sdcvvc and Dimitris Andreou's algorithms.

Storage:

l_j = ceil (log_2 (sum_{i=1}^n i^j))


l_j > log_2 n^j (assuming n >= 0, k >= 0)
l_j > j log_2 n \in \Omega(j log n)

l_j < log_2 ((sum_{i=1}^n i)^j) + 1


l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
l_j < j log_2 n + j + c \in O(j log n)`

So l_j \in \Theta(j log n)

Total storage used: \sum_{j=1}^k l_j \in \Theta(k^2 log n)

Space used: assuming that computing a^j takes ceil(log_2 j) time, total time:

t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))


t > k log_2 (n^n + O(n^(n-1)))
t > k log_2 (n^n) = kn log_2 (n) \in \Omega(kn log n)
t < k log_2 (\prod_i=1^n i^i) + 1
t < kn log_2 (n) + 1 \in O(kn log n)

Total time used: \Theta(kn log n)

If this time and space is satisfactory, you can use a simple recursive algorithm. Let b!i be the ith
entry in the bag, n the number of numbers before removals, and k the number of removals. In
Haskell syntax...

let
let
-- O(1)
isInRange low high v = (v >= low) && (v <= high)
-- O(n - k)
countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(n-k)]
findMissing l low high krange
-- O(1) if there is nothing to find.
| krange=0 = l
-- O(1) if there is only one possibility.
| low=high = low:l
-- Otherwise total of O(knlog(n)) time
| otherwise =
let
mid = (low + high) `div` 2
klow = countInRange low mid
khigh = krange - klow
in
findMissing (findMissing low mid klow) (mid + 1) high khigh
in
findMising 1 (n - k) k

Storage used: O(k) for list, O(log(n)) for stack: O(k + log(n)) This algorithm is more
intuitive, has the same time complexity, and uses less space.

share improve this answer answered Sep 2 '10 at 11:41

a1kmm
764 4 8

1 +1, looks nice but you lost me going from line 4 to line 5 in snippet #1 -- could you explain that further?
Thanks! j_random_hacker Oct 28 '10 at 8:16

Can you explain in words what your algorithm works? Thomas Ahle Apr 26 at 13:34

Here's a solution that uses k bits of extra storage, without any clever tricks and just straightforward.
Execution time O (n), extra space O (k). Just to prove that this can be solved without reading up on
5 the solution first or being a genius:

void puzzle (int* data, int n, bool* extra, int k)


{
// data contains n distinct numbers from 1 to n + k, extra provides
// space for k extra bits.

// Rearrange the array so there are (even) even numbers at the start
// and (odd) odd numbers at the end.
int even = 0, odd = 0;
while (even + odd < n)
{
if (data [even] % 2 == 0) ++even;
else if (data [n - 1 - odd] % 2 == 1) ++odd;
else { int tmp = data [even]; data [even] = data [n - 1 - odd];
data [n - 1 - odd] = tmp; ++even; ++odd; }
}

// Erase the lowest bits of all numbers and set the extra bits to 0.
for (int i = even; i < n; ++i) data [i] -= 1;
for (int i = 0; i < k; ++i) extra [i] = false;

// Set a bit for every number that is present


for (int i = 0; i < n; ++i)
{
int tmp = data [i];
tmp -= (tmp % 2);
if (i >= odd) ++tmp;
if (tmp <= n) data [tmp - 1] += 1; else extra [tmp - n - 1] = true;
}

// Print out the missing ones


for (int i = 1; i <= n; ++i)
if (data [i - 1] % 2 == 0) printf ("Number %d is missing\n", i);
for (int i = n + 1; i <= n + k; ++i)
if (! extra [i - n - 1]) printf ("Number %d is missing\n", i);

// Restore the lowest bits again.


for (int i = even; i < n; ++i) data [i] += 1;
}
share improve this answer edited Apr 14 '14 at 10:29 answered Apr 7 '14 at 18:53

gnasher729
31.7k 2 28 51

Did you want (data [n - 1 - odd] % 2 == 1) ++odd; ? Charles Apr 12 '14 at 13:36

Thanks, fixed it. gnasher729 Apr 14 '14 at 10:29

Could you explain how this works? I don't understand. Teepeemm Sep 26 '14 at 14:03

The solution would be very, very, simple if I could use an array of (n + k) booleans for temporary storage, but
that is not allowed. So I rearrange the data, putting the even numbers at the beginning, and the odd numbers
at the end of the array. Now the lowest bits of those n numbers can be used for temporary storage, because I
know how many even and odd numbers there are and can reconstruct the lowest bits! These n bits and the k
extra bits are exactly the (n + k) booleans that I needed. gnasher729 Oct 15 '14 at 16:40

This wouldn't work if the data were too large to keep in memory, and you only saw it as a stream. Deliciously
hacky though :) Thomas Ahle Apr 26 at 13:59

show 1 more comment

Can you check if every number exists? If yes you may try this:

3 S = sum of all numbers in the bag (S < 5050)


Z = sum of the missing numbers 5050 - S

if the missing numbers are x and y then:

x = Z - y and
max(x) = Z - 1

So you check the range from 1 to max(x) and find the number

share improve this answer edited Nov 21 '12 at 17:57 answered Aug 16 '10 at 10:37

Nakilon Ilian Iliev


19.5k 8 61 87 2,052 1 14 45

1 What does max(x) mean, when x is a number? Thomas Ahle Apr 26 at 13:56

he probably means max from the set of numbers JavaHopper Aug 9 at 16:22

You can solve Q2 if you have the sum of both lists and the product of both lists.

3 (l1 is the original, l2 is the modified list)

d = sum(l1) - sum(l2)
m = mul(l1) / mul(l2)

We can optimise this since the sum of an arithmetic series is n times the average of the first and last
terms:

n = len(l1)
d = (n/2)*(n+1) - sum(l2)

Now we know that (if a and b are the removed numbers):

a + b = d
a * b = m

So we can rearrange to:

a = s - b
b * (s - b) = m

And multiply out:

-b^2 + s*b = m

And rearrange so the right side is zero:

-b^2 + s*b - m = 0

Then we can solve with the quadratic formula:

b = (-s + sqrt(s^2 - (4*-1*-m)))/-2


a = s - b

Sample Python 3 code:

from functools import reduce


import operator
import math
x = list(range(1,21))
sx = (len(x)/2)*(len(x)+1)
x.remove(15)
x.remove(5)
mul = lambda l: reduce(operator.mul,l)
s = sx - sum(x)
m = mul(range(1,21)) / mul(x)
b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2
a = s - b
print(a,b) #15,5
I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the
complexity of this solution (if anyone does know please comment below.)

share improve this answer answered Nov 16 '14 at 16:14

Tuomas Laakkonen
417 2 12
How much time and memory does it use to calculate x1*x2*x3*... ? Thomas Ahle Apr 26 at 13:57

@ThomasAhle It is O(n)-time and O(1)-space on the length of the list, but in reality it's more as multiplication
(at least in Python) is O(n^1.6)-time on the length of the number and numbers are O(log n)-space on their
length. Tuomas Laakkonen Apr 29 at 16:21

@ThomasAhle No, log(a^n) = n*log(a) so you would have O(l log k)-space to store the number. So given a list
of length l and original numbers of length k, you would have O(l)-space but the constant factor (log k) would be
lower than just writing them all out. (I don't think my method is a particularly good way of answering the
question.) Tuomas Laakkonen Apr 29 at 17:16

To solve the 2 (and 3) missing numbers question, you can modify quickselect , which on average
runs in O(n) and uses constant memory if partitioning is done in-place.
3
1. Partition the set with respect to a random pivot p into partitions l , which contain numbers
smaller than the pivot, and r , which contain numbers greater than the pivot.
2. Determine which partitions the 2 missing numbers are in by comparing the pivot value to the
size of each partition ( p - 1 - count(l) = count of missing numbers in l and n -
count(r) - p = count of missing numbers in r )
3. a) If each partition is missing one number, then use the difference of sums approach to find
each missing number.

(1 + 2 + ... + (p-1)) - sum(l) = missing #1 and ((p+1) + (p+2) ... + n) -


sum(r) = missing #2

b) If one partition is missing both numbers and the partition is empty, then the missing numbers
are either (p-1,p-2) or (p+1,p+2) depending on which partition is missing the numbers.

If one partition is missing 2 numbers but is not empty, then recurse onto that partiton.

With only 2 missing numbers, this algorithm always discards at least one partition, so it retains
O(n) average time complexity of quickselect. Similarly, with 3 missing numbers this algorithm also
discards at least one partition with each pass (because as with 2 missing numbers, at most only 1
partition will contain multiple missing numbers). However, I'm not sure how much the performance
decreases when more missing numbers are added.

Here's an implementation that does not use in-place partitioning, so this example does not meet the
space requirement but it does illustrate the steps of the algorithm:

<?php

$list = range(1,100);
unset($list[3]);
unset($list[31]);

findMissing($list,1,100);

function findMissing($list, $min, $max) {


if(empty($list)) {
print_r(range($min, $max));
return;
}

$l = $r = [];
$pivot = array_pop($list);

foreach($list as $number) {
if($number < $pivot) {
$l[] = $number;
}
else {
$r[] = $number;
}
}

if(count($l) == $pivot - $min - 1) {


// only 1 missing number use difference of sums
print array_sum(range($min, $pivot-1)) - array_sum($l) . "\n";
}
else if(count($l) < $pivot - $min) {
// more than 1 missing number, recurse
findMissing($l, $min, $pivot-1);
}

if(count($r) == $max - $pivot - 1) {


// only 1 missing number use difference of sums
print array_sum(range($pivot + 1, $max)) - array_sum($r) . "\n";
} else if(count($r) < $max - $pivot) {
Demo // mroe than 1 missing number recurse
findMissing($r, $pivot+1, $max);
share }improve this answer edited May 18 at 3:21 answered Nov 8 '15 at 2:45
}
FuzzyTree
20.2k 2 14 35

Partitioning the set is like using linear space. At least it wouldn't work in a streaming setting. Thomas Ahle
Apr 26 at 13:44

@ThomasAhle see en.wikipedia.org/wiki/Selection_algorithm#Space_complexity. partioning the set in place


only requires O(1) additional space - not linear space. In a streaming setting it would be O(k) additional space,
however, the original question does not mention streaming. FuzzyTree Apr 26 at 14:37

Not directly, but he does write "you must scan the input in O(N), but you can only capture small amount of
information (defined in terms of k not N)" which is usually the definition of streaming. Moving all the numbers
for partitioning isn't really possible unless you have an array of size N. It's just that the question has a lot of
answers witch seem to ignore this constraint. Thomas Ahle Apr 26 at 15:10

1 But as you say, the performance may decrease as more numbers are added? We can also use the linear time
median algorithm, to always get a perfect cut, but if the k numbers are well spread out in 1,...,n, wont you
have to go about logk levels "deep" before you can prune any branches? Thomas Ahle Apr 26 at 22:01

1 The worst-case running time is indeed nlogk because you need to process the whole input at most logk times,
and then it's a geometric sequence (one that starts with at most n elements). The space requirements are logn
when implemented with plain recursion, but they can be made O(1) by running an actual quickselect and
ensuring the correct length of each partition. emu May 4 at 8:04

show 7 more comments

I think this can be done without any complex mathematical equations and theories. Below is a
proposal for an in place and O(2n) time complexity solution:
2
Input form assumptions :

# of numbers in bag = n

# of missing numbers = k

The numbers in the bag are represented by an array of length n

Length of input array for the algo = n

Missing entries in the array (numbers taken out of the bag) are replaced by the value of the first
element in the array.

Eg. Initially bag looks like [2,9,3,7,8,6,4,5,1,10]. If 4 is taken out, value of 4 will become 2 (the first
element of the array). Therefore after taking 4 out the bag will look like [2,9,3,7,8,6,2,5,1,10]

The key to this solution is to tag the INDEX of a visited number by negating the value at that INDEX
as the array is traversed.

IEnumerable<int> GetMissingNumbers(int[] arrayOfNumbers)


{
List<int> missingNumbers = new List<int>();
int arrayLength = arrayOfNumbers.Length;

//First Pass
for (int i = 0; i < arrayLength; i++)
{
int index = Math.Abs(arrayOfNumbers[i]) - 1;
if (index > -1)
{
arrayOfNumbers[index] = Math.Abs(arrayOfNumbers[index]) * -1; //Marking the visited indexes
}
}

//Second Pass to get missing numbers


for (int i = 0; i < arrayLength; i++)
{
//If this index is unvisited, means this is a missing number
if (arrayOfNumbers[i] > 0)
{
missingNumbers.Add(i + 1);
}
}

return missingNumbers;
}

share improve this answer answered Dec 12 '12 at 18:57

pickhunter
175 4 11

This uses too much memory. Thomas Ahle Apr 26 at 13:54

This might sound stupid, but, in the first problem presented to you, you would have to see all the
remaining numbers in the bag to actually add them up to find the missing number using that
remaining numbers in the bag to actually add them up to find the missing number using that
2 equation.

So, since you get to see all the numbers, just look for the number that's missing. The same goes for
when two numbers are missing. Pretty simple I think. No point in using an equation when you get to
see the numbers remaining in the bag.

share improve this answer edited May 29 '14 at 12:46 answered Sep 2 '10 at 3:27

Peter Mortensen Stephan M


10.2k 13 69 107 29 1

2 I think the benefit of summing them up is that you don't have to remember which numbers you've already seen
(e.g., there's no extra memory requirement). Otherwise the only option is to retain a set of all the values seen
and then iterate over that set again to find the one that's missing. Dan Tao Sep 2 '10 at 23:00

3 This question is usually asked with the stipulation of O(1) space complexity. Matthieu N. Sep 14 '10 at 21:38

The sum of the first N numbers is N(N+1)/2. For N=100, Sum=100*(101)/2=5050 ; tmarthal Apr 29 '11 at
1:39

May be this algorithm can work for question 1:


2
1. Precompute xor of first 100 integers(val=1^2^3^4....100)
2. xor the elements as they keep coming from input stream ( val1=val1^next_input)
3. final answer=val^val1

Or even better:

def GetValue(A)
for i=1 to 100
do
val=val^i
done
for value in A:
do
val=val^value
done
return val

This algorithm can in fact be expanded for two missing numbers. The first step remains the same.
When we call GetValue with two missing numbers the result will be a a1^a2 are the two missing
numbers. Lets say

val = a1^a2

Now to sieve out a1 and a2 from val we take any set bit in val. Lets say the ith bit is set in val.
That means that a1 and a2 have different parity at ith bit position. Now we do another iteration on
the original array and keep two xor values. One for the numbers which have the ith bit set and other
which doesn't have the ith bit set. We now have two buckets of numbers, and its guranteed that a1
and a2 will lie in different buckets. Now repeat the same what we did for finding one missing
element on each of the bucket.

share improve this answer edited May 4 at 4:21 answered Dec 6 '11 at 12:03

bashrc
2,397 1 7 29
This only solves the problem for k=1 , right? But I like using xor over sums, it seems a bit faster.
Thomas Ahle Apr 26 at 13:52

@ThomasAhle Yes. I have called that out in my answer. bashrc Apr 26 at 16:29

Right. Do you have an idea what a "second order" xor might be, for k=2? Similar to using squares for sum,
could we "square" for xor? Thomas Ahle Apr 26 at 16:32

1 @ThomasAhle Modified it to work for 2 missing numbers. bashrc May 4 at 4:21

You could try using a Bloom Filter. Insert each number in the bag into the bloom, then iterate over
the complete 1-k set until reporting each one not found. This may not find the answer in all
1 scenarios, but might be a good enough solution.

share improve this answer edited Sep 2 '10 at 22:22 answered Sep 2 '10 at 16:29

jdizzle
2,324 1 16 24

There is also the counting bloom filter, which allows deletion. Then you can just add all the numbers and
delete the ones you see in the stream. Thomas Ahle Apr 25 at 21:31

This is probably the best reference: arxiv.org/pdf/0704.3313.pdf Thomas Ahle Apr 26 at 15:35
I'd take a different approach to that question and probe the interviewer for more details about the
larger problem he's trying to solve. Depending on the problem and the requirements surrounding it,
1 the obvious set-based solution might be the right thing and the generate-a-list-and-pick-through-it-
afterward approach might not.

For example, it might be that the interviewer is going to dispatch n messages and needs to know
the k that didn't result in a reply and needs to know it in as little wall clock time as possible after
the n-k th reply arrives. Let's also say that the message channel's nature is such that even running
at full bore, there's enough time to do some processing between messages without having any
impact on how long it takes to produce the end result after the last reply arrives. That time can be
put to use inserting some identifying facet of each sent message into a set and deleting it as each
corresponding reply arrives. Once the last reply has arrived, the only thing to be done is to remove
its identifier from the set, which in typical implementations takes O(log k+1) . After that, the set
contains the list of k missing elements and there's no additional processing to be done.

This certainly isn't the fastest approach for batch processing pre-generated bags of numbers
because the whole thing runs O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... +
log k)) . But it does work for any value of k (even if it's not known ahead of time) and in the
example above it was applied in a way that minimizes the most critical interval.

share improve this answer answered Sep 3 '10 at 2:57

Blrfl
5,158 16 21

Would this work if you only have O(k^2) extra memory? Thomas Ahle Apr 26 at 13:50

A very simple way to do it in roughly O(N) time is to remove each element when seen in both lists.
This works for unsorted lists too and can be easily further optimized if the lists are both sorted.
1
import random

K = 2
missingNums = range(0, 101)
incompleteList = range(0, 101)

#Remove K numbers
for i in range(K):
valueToRemove = random.choice(incompleteList)
incompleteList.remove(valueToRemove)

dummyVariable = [missingNums.remove(num) for num in p if num in missingNums]

print missingNums

share improve this answer answered Dec 27 '15 at 2:12

Amir
2,187 2 13 31

1 This uses too much space. Thomas Ahle Apr 26 at 13:44

There is a general way to generalize streaming algorithms like this. The idea is to use a bit of
randomization to hopefully 'spread' the k elements into independent sub problems, where our
1 original algorithm solves the problem for us. This technique is used in sparse signal reconstruction,
among other things.

Make an array, a , of size u = k^2 .


Pick any universal hash function, h : {1,...,n} -> {1,...,u} . (Like multiply-shift)
For each i in 1, ..., n increase a[h(i)] += i
For each number x in the input stream, decrement a[h(x)] -= x .

If all of the missing numbers have been hashed to different buckets, the non-zero elements of the
array will now contain the missing numbers.

The probability that a particular pair is sent to the same bucket, is less than 1/u by definition of a
universal hash function. Since there are about k^2/2 pairs, we have that the error probability is at
most k^2/2/u=1/2 . That is, we succeed with probability at least 50%, and if we increase u we
increase our chances.

Notice that this algorithm takes k^2 logn bits of space (We need logn bits per array bucket.)
This matches the space bound from @Dimitris Andreou's answers, which happens to also be
randomized. This algorithm also has constant time per update, rather than time k in the case of
power-sums.
power-sums.

I'd be surprised if there is a more efficient algorithm than the above. In theory (the space bound) as
well as in practice (the actual running time).

share improve this answer edited Apr 26 at 16:13 answered Apr 25 at 21:54

Thomas Ahle
16.1k 10 54 86

Note: We can also use xor in each bucket, rather than sum , if that's faster on our machine. Thomas Ahle
Apr 26 at 13:53

Interesting but I think this only respects the space constraint when k <= sqrt(n) - at least if u=k^2 ?
Suppose k=11 and n=100, then you would have 121 buckets and the algorithm would end up being similar to
having an array of 100 bits that you check off as you read each # from the stream. Increasing u improves the
chances of success but there's a limit to how much you can increase it before you exceed the space
constraint. FuzzyTree Apr 27 at 2:18

The problem makes most sense for n much larger than k , I think, but you can actually get space down to
k logn with a method very similar to the hashing described, while still having constant time updates. It's
described in gnunet.org/eppstein-set-reconciliation , like the sum of powers method, but basically you hash to
'two of k' buckets with a strong hash function like tabulation hashing, which guarantees that some bucket will
have only one element. To decode, you identify that bucket and removes the element from both of its buckets,
which (likely) frees another bucket and so on Thomas Ahle May 11 at 10:54

Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even
have support for it, like in Ruby:
0
missing = (1..100).to_a - bag

It's probably not the most efficient solution but it's one I would use in real life if I was faced with
such a task in this case (known boundaries, low boundaries). If the set of number would be very
large then I would consider a more efficient algorithm, of course, but until then the simple solution
would be enough for me.

share improve this answer answered Aug 16 '10 at 11:18

DarkDust
65.7k 9 126 164

This uses too much space. Thomas Ahle Apr 26 at 13:43

@ThomasAhle: Why are you adding useless comments to every second answer? What do you mean with it's
using too much space? DarkDust Apr 26 at 14:17

Because the question says that "We can't afford any additional space proportional to N." This solution does
exactly that. Thomas Ahle Apr 26 at 14:25

I believe I have a O(k) time and O(log(k)) space algorithm, given that you have the floor(x)
and log2(x) functions for arbitrarily big integers available:
0
You have an k -bit long integer (hence the log8(k) space) where you add the x^2 , where x is
the next number you find in the bag: s=1^2+2^2+... This takes O(N) time (which is not a problem
for the interviewer). At the end you get j=floor(log2(s)) which is the biggest number you're
looking for. Then s=s-j and you do again the above:

for (i = 0 ; i < k ; i++)


{
j = floor(log2(s));
missing[i] = j;
s -= j;
}

Now, you usually don't have floor and log2 functions for 2756 -bit integers but instead for doubles.
So? Simply, for each 2 bytes (or 1, or 3, or 4) you can use these functions to get the desired
numbers, but this adds an O(N) factor to time complexity

share improve this answer edited Nov 21 '12 at 17:58 answered Sep 7 '10 at 14:43

Nakilon CostasGR43
19.5k 8 61 87 1

Try to find the product of numbers from 1 to 50:

0 Let product, P1 = 1 x 2 x 3 x ............. 50

When you take out numbers one by one, multiply them so that you get the product P2. But two
numbers are missing here, hence P2 < P1.
numbers are missing here, hence P2 < P1.

The product of the two mising terms, a x b = P1 - P2.

You already know the sum, a + b = S1.

From the above two equations, solve for a and b through a quadratic equation. a and b are your
missing numbers.

share improve this answer edited May 29 '14 at 12:45 answered Aug 25 '10 at 13:56

Peter Mortensen Manjunath


10.2k 13 69 107 9 1

Provably there are no quadratic equations for numbers 3 or greater. Just 2. Tatarize Dec 17 '15 at 14:12

I tried to apply the given formulae but I failed. Let's take N=3 (sequence {1,2,3} ) with two missing numbers
{a,b} = {1,2} . That results ab = 6-3, a+b = 6 b=6-a, a-6a+3 = 0 wrong. dma_k
Sep 2 at 8:44

I think this can be generalized like this:

0 Denote S, M as the initial values for the sum of arithmetic series and multiplication.

S = 1 + 2 + 3 + 4 + ... n=(n+1)*n/2
M = 1 * 2 * 3 * 4 * .... * n

I should think about a formula to calculate this, but that is not the point. Anyway, if one number is
missing, you already provided the solution. However, if two numbers are missing then, let's denote
the new sum and total multiple by S1 and M1, which will be as follows:

S1 = S - (a + b)....................(1)

Where a and b are the missing numbers.

M1 = M - (a * b)....................(2)

Since you know S1, M1, M and S, the above equation is solvable to find a and b, the missing
numbers.

Now for the three numbers missing:

S2 = S - ( a + b + c)....................(1)

Where a and b are the missing numbers.

M2 = M - (a * b * c)....................(2)

Now your unknown is 3 while you just have two equations you can solve from.

share improve this answer edited May 29 '14 at 12:59 answered Aug 16 '13 at 14:26

Peter Mortensen Jack_of_All_Trades


10.2k 13 69 107 3,208 10 29 65

The multiplication gets quite large though.. Also, how do you generalize to more than 2 missing numbers?
Thomas Ahle Apr 26 at 13:39

I have tried these formulae on very simple sequence with N = 3 and missing numbers = {1, 2}. I didn't work, as
I believe the error is in formulae (2) which should read M1 = M / (a * b) (see that answer). Then it
works fine. dma_k Sep 2 at 8:37

I don't know whether this is efficient or not but I would like to suggest this solution.

0 1. Compute xor of the 100 elements


2. Compute xor of the 98 elements (after the 2 elements are removed)
3. Now (result of 1) XOR (result of 2) gives you the xor of the two missing nos i..e a XOR b if a
and b are the missing elements
4.Get the sum of the missing Nos with your usual approach of the sum formula diff and lets say
the diff is d.

Now run a loop to get the possible pairs (p,q) both of which lies in [1 , 100] and sum to d.

When a pair is obtained check whether (result of 3) XOR p = q and if yes we are done.

Please correct me if I am wrong and also comment on time complexity if this is correct

share improve this answer answered Aug 4 '14 at 21:49

user2221214
user2221214
67 9

2 I don't think the sum and xor uniquely define two numbers. Running a loop to get all possible k-tuples that sum
to d takes time O(C(n,k-1))=O(n<sup>k-1</sup>), which, for k>2, is bad. Teepeemm Sep 26 '14 at 13:57

The key is to use indexes to mark if a number is present or not in the range. Here we know that we
have 1 to N. Time complexity O(n) Space complexity O(1)
0
Followup questions: This may be modified to find if an element is missing from an AP of difference
d. Other variation may include find first missing +ve number from any random array containing -ve
number as well. Then first partition around 0 quick sort, then do this procedure on right side of
partition part of the array, do necessary modification.

public static void missing(int [] arr){


for(int i=0; i< arr.length; i++){
if(arr[i]!=-1 && arr[i]<=arr.length){
int idx=i;
while(idx>=0 && idx<arr.length&& arr[idx]!=-1 ){
int temp =arr[idx];
// temp-1 because array index starts from 0, i.e a[0]=-1 is indicates that 1 is present in the array
arr[temp-1]=-1;
idx=temp-1;
}
}
}
}

After this we need to iterate over array, and check if a[i]!=-1, then i+1 is the missing number. We
have to be careful when a[i]>N.

share improve this answer edited Feb 26 '15 at 22:40 answered Feb 19 '15 at 1:36

Rahul
112 4

"do a quick sort"? That doesn't fit inside the O(n) time and O(1) space complexities. GuyGreer Feb 26 '15 at
20:28

@GuyGreer, I should have been more precise with words. When I said quick sort , I meant partition around "0".
I think you didn't understand at all. You saw quick sort and and jumped to down-vote!. Rahul Feb 26 '15 at
22:33

What do you mean by "partition around 0"? I would interpret that to mean "find which numbers are greater than
0, and which less". But we know that the numbers come from 1 to N, so my interpretation doesn't gain us any
information. Teepeemm Nov 8 '15 at 20:42

You can motivate the solution by thinking about it in terms of symmetries (groups, in math
language). No matter the order of the set of numbers, the answer should be the same. If you're
0 going to use k functions to help determine the missing elements, you should be thinking about
what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n is
an example of a symmetric function, but there are others of higher degree. In particular, consider
the elementary symmetric functions. The elementary symmetric function of degree 2 is s_2(x)
= x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n , the sum of all
products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher.
They are obviously symmetric. Furthermore, it turns out they are the building blocks for all
symmetric functions.

You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) =
s_2(x) + s_1(x)(x_(n+1)) . Further thought should convince you that s_3(x,x_(n+1)) = s_3(x)
+ s_2(x)(x_(n+1)) and so on, so they can be computed in one pass.

How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-
x_2)...(z-x_n) . It evaluates to 0 if you put in any of the numbers x_i . Expanding the
polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n . The elementary symmetric
functions appear here too, which is really no surprise, since the polynomial should stay the same if
we apply any permutation to the roots.

So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as
others have mentioned.

Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric
polynomial will be of the order 100! ), we can do these calculations mod p where p is a prime
bigger than 100. In that case we evaluate the polynomial mod p and find that it again evaluates to
0 when the input is a number in the set, and it evaluates to a non-zero value when the input is a
number not in the set. However, as others have pointed out, to get the values out of the polynomial
in time that depends on k , not N , we have to factor the polynomial mod p .

share improve this answer edited Apr 21 '15 at 19:49 answered Apr 21 '15 at 4:57

Edward Doolittle
Edward Doolittle
2,862 2 4 18

I have written the code using Java 8 and before Java 8. It uses a formula : (N*(N+1))/2 for sum of
all the numbers.
0

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

/**
*
*
* @author pradeep
*
* Answer : SumOfAllNumbers-SumOfPresentNumbers=Missing Number;
*
* To GET SumOfAllNumbers : Get the highest number (N) by checking the
* length. and use the formula (N*(N+1))/2
*
* To GET SumOfPresentNumbers: iterate and add it
*
*
*/
public class FindMissingNumber {
/**
* Before Java 8
*
* @param numbers
* @return
*/
public static int missingNumber(List<Integer> numbers) {
int sumOfPresentNumbers = 0;
for (Integer integer : numbers) {
sumOfPresentNumbers = sumOfPresentNumbers + integer;
}
int n = numbers.size();
int sumOfAllNumbers = (n * (n + 1)) / 2;
return sumOfAllNumbers - sumOfPresentNumbers;
}
/**
* Using Java 8 . mapToInt & sum using streams.
*
* @param numbers
* @return
share improve
*/ this answer answered Mar 2 at 19:27
public static int missingNumberJava8(List<Integer> numbers) {
int sumOfPresentNumbers = numbers.stream().mapToInt(i -> i).sum();
int n = numbers.size(); Pradeep Padmarajaiah
int sumOfAllNumbers = (n * (n + 1)) / 2;
99 1 2
return sumOfAllNumbers - sumOfPresentNumbers;
}
public static void main(String[] args) {
1 Did you List<Integer>
not read the question?
listThis
= finds
new one missing number. OP wanted k missing numbers.
ArrayList<>();
Debosmit
listRay= Mar 30 at 6:23
Arrays.asList(0, 1, 2, 4);
System.out.println("Missing number is : " + missingNumber(list));
System.out.println("Missing number using Java 8 is : " + missingNumberJava8(list));
}
}*

For Q2 this is a solution that is a bit more inefficient than the others, but still has O(N) runtime and
takes O(k) space.
0
The idea is to run the original algorithm two times. In the first one you get a total number which is
The idea is to run the original algorithm two times. In the first one you get a total number which is
missing, which gives you an upper bound of the missing numbers. Let's call this number N . You
know that the missing two numbers are going to sum up to N , so the first number can only be in
the interval [1, floor((N-1)/2)] while the second is going to be in [floor(N/2)+1,N-1] .

Thus you loop on all numbers once again, discarding all numbers that are not included in the first
interval. The ones that are, you keep track of their sum. Finally, you'll know one of the missing two
numbers, and by extension the second.

I have a feeling that this method could be generalized and maybe multiple searches run in "parallel"
during a single pass over the input, but I haven't yet figured out how.

share improve this answer answered May 24 at 16:51

Svalorzen
2,340 2 10 35

If N is only 100, then any reasonable algorithm will run in so little time and use so little memory that
will run faster than a few milliseconds. Thus, it is a waste of time to optimize it further or put much
0 thought into a complex algorithm. You are at a bigger risk of introducing bugs than achieving faster

code.

I would literally tell the person that asked the question, "Get out of my office! I have important work
to do!"

100? Really? 100 integers fit in the CPU cache of any Intel CPU made in this millennium. You could
literally run a linear search that copies the entire list and never leave the L2 cache of the CPU. It
would run so fast you wouldn't notice if we accidentally ran the algorithm twice and threw away the
first result.

Now, if the person comes back and explains that N is a lot larger, we have a different situation.
Maybe there are 10 million balls. A modern CPU can sort a million integers pretty darn fast, so I
would demand to see a benchmark that shows a dumb-but-easy-to-verify algorithm isn't fast
enough.

But what if they say that N is billions and billions?

I would suggest that someone write down the numbers on the balls as they are removed. Thus,
you've eliminated the need for any algorithm. If they refuse to keep such a log, I would sit down with
them and possibly their manager and work out a solution.

I imagine that they'd claim that there is no decent way to convey this information as they balls are
removed. They'd use terms like "leaky abstraction" and "layering violation". In which case I'd assert
that they've designed the system wrong if such a crucial design requirement (tracking the removed
balls) can't be satisfied by their design.

If they still refuse, I'd raise the issue to management. Why are they being so toxic? Why are they
removing the balls AND refusing to log what numbers are on them?

Heck, why are they even removing the balls? If there are billions and billions of balls, you're going
to have bigger problems. For example, where do you keep them all? Are you keeping an inventory?
Why doesn't the inventory track this kind of information in real time? Can't this problem be fixed at
the supply chain level?

share improve this answer answered Jun 23 at 11:11

TomOnTime
894 8 22
2 I don't think telling the interviewer to get out of an office you don't own claiming it's yours will help your odds of
getting the job. I do appreciate that these interview questions can get pretty far-removed from real life
problems, though. GuyGreer Jun 23 at 19:22

1 2 next

protected by JJJ Nov 1 '13 at 8:01


Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be
removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged algorithm math or ask your own
question.

question feed
about us tour help blog chat data legal privacy policy work here advertising info mobile contact us feedback

TECHNOLOGY LIFE / ARTS CULTURE / RECREATION SCIENCE OTHER

Stack Overflow Programmers Database Code Review Photography Academia English Bicycles Mathematics Philosophy Stack Apps
Administrators Language &
Server Fault Unix & Linux Magento Science Fiction more (8) Usage Role- Cross Validated more (3) Meta Stack
Drupal Answers & Fantasy playing (stats) Exchange
Super User Ask Different Signal Skeptics Games
(Apple) SharePoint Processing Graphic Design Theoretical Area 51
Web Mi Yodeya Anime & Computer
Applications WordPress User Experience Raspberry Pi Movies & TV (Judaism) Manga Science Stack

Ask Ubuntu Development Mathematica Programming Music: Practice Travel more (18) Physics Overflow
Puzzles & Code & Theory Careers
Webmasters Geographic Salesforce Golf Christianity MathOverflow
Information Seasoned
Game Systems ExpressionEngine more (7) Advice English Chemistry
Development Answers (cooking) Language
Electrical Learners Biology
TeX - LaTeX Engineering Cryptography Home
Japanese Computer
Improvement Science
Android Language
Enthusiasts Personal
Finance & Arqade
Information Money (gaming)
Security

site design / logo 2016 Stack Exchange Inc; user contributions licensed under cc by-sa 3.0 with attribution required
rev 2016.10.7.4047

You might also like