Searching sorted arrays

Binary search is to algorithms
what a wheel is to mechanics:
It is simple, elegant, and immensely important.

— Udi Manber, Introduction to Algorithms

Binary search is a notoriously tricky algorithm
to program correctly.
It took seventeen years after its invention
until the first correct version of binary search
was published!

— Steven Skiena, The Algorithm Design Manual

This chapter studies the following search problem:  find a given number in a sorted array of numbers.  More specifically, given an integer x and an increasing array v[0..n-1] of integers,

find an index  m  such that  v[m] == x .

Some instances of this problem have no solution, of course.  For example, the instances where n is 0 and those in which v[0] < x < v[1] have no solution.

Table of contents:

The search problem

We begin by making a more general and more interesting formulation of the problem:  find the place in the array where x should be.  More precisely, given x and an increasing array v[0..n-1], find an index  j  such that 

v[j-1]  <  x  ≤  v[j] .

To obtain a solution to the original problem, all we need to do is compare x with v[j].

To best expoit this formulation of the problem, we allow the solution j to assume the values  0  and  n.  In these two cases, the expression v[j-1] < x ≤ v[j] must be interpreted with some flexibility:  if j == 0, the expression reduces to  x ≤ v[0], since v[-1] is undefined; and if j == n, the expression reduces to  v[n-1] < x, since v[n] is undefined.  It is as if v[-1] had value −∞ and v[n] had value +∞.

In the following example, if x is 555, the solution is 4;  if x is 800, the solution is 8;  and if x is 1000, the solution is 13.

0                       n-1
111 222 333 444 555 555 666 777 888 888 888 999 999

We shall examine two algorithms for the problem: one obvious but slow and one sophisticated and much faster.

Exercises 1

  1. Write a function to decide whether an array v[0..n-1] is in increasing order.  Then, criticize the following code.
    int check (int v[], int n) {
       int previous = v[0], yes = 1;
       for (int i = 1; i < n && yes; ++i) {
          if (previous > v[i]) yes = 0;
          previous = v[i]; }
       return yes; }
    
  2. Criticize the following alternative formulation of the search problem: given x and an increasing array v[0..n-1], find an index j such that v[j-1] ≤ x ≤ v[j].
  3. Criticize the alternative formulation of the search problem built around the expression v[j-1] < x < v[j].

We begin with an obvious algorithm, one that examines the elements of the array in order, one by one. Here is an implementation of the algorithm:

// This function receives an integer x and
// an increasing array v[0..n-1] and returns
// an index j in 0..n such that v[j-1] <
// x <= v[j].

int 
sequentialsearch (int x, int n, int v[]) {
   int j = 0;
   while (j < n && v[j] < x) 
      ++j;
   return j;
}

How many iterations does the function execute?  Better: how many comparisons between x and elements of v does it do?  In the worst case, x is compared with each element of the array and therefore the total number of comparisons is  n.  Hence, if the size of the array were to be multiplied by 100, the number of comparisons would be also multiplied by 100.

The time consumption of the function is proportional to the number of comparisons that involve x, and therefore proportional to  n  in the worst case.

Is it possible to solve the problem in less time?  Is it possible to solve the problem without comparing x with each element of the array?  As we shall see, the answer is affirmative.

Exercises 2

  1. Invariants.  What are the invariants of the iterative process in the function sequentialsearch?  Use the invariants to show that the function is correct.  [Solution: ./solutions/bubi9.html]
  2. Discuss the following version of the function sequentialsearch:
    int seqSearch2 (int x, int n, int v[]) {
       int j;
       for (j = 0; j < n; ++j) 
          if (x <= v[j]) break;
       return j; }
    
  3. Criticize the following version of the function sequentialsearch:
    int seqSearch3 (int x, int n, int v[]) {
       int j = 0;
       while (v[j] < x && j < n) ++j;
       return j; }
    
  4. Discuss the following recursive version of the function sequentialsearch:
    int seqSearch4 (int x, int n, int v[]) {
       if (n == 0) return 0;
       if (x > v[n-1]) return n;
       return seqSearch4 (x, n-1, v); }
    
  5. Write a version of the function sequentialsearch that will solve the following variation of the problem: given x and an increasing array v[0..n-1], find an index j such that v[j] x < v[j+1].

Binary search

There is an algorithm much faster than sequential search. It is analogous to the method that one uses to find a word in a dictionary of old (the one that looks like a thick book).  Of course the ideia only works because the array is sorted.

// This function receives an integer x
// and an increasing array v[0..n-1]
// and returns an index j in 0..n 
// such that v[j-1] < x <= v[j].

int 
binarysearch (int x, int n, int v[]) { 
   int l = -1, r = n;  // attention!
   while (l < r-1) {  
      int m = (l + r)/2;
      if (v[m] < x) l = m;
      else r = m; 
   }
   return r;
}

Simple and clean!  The names of the variables were not chosen at random:  l  stands for leftm  stands for middle  and  r  stands for right.  The result of the division by 2 in the expression (l+r)/2 is automatically truncated, since the variables are of type int.  For example, if l is 6 and r is 9, the value of the expression (l+r)/2 is 7.

0           l     r     n-1
111 222 333 444 555 555 666 777 888 888 888 999 999

The ideia of binary search is a veritable egg of Columbus! This ideia is the point of departure of several efficient algorithms to many problems.

Exercises 3

  1. To avoid values of indices outside the interval 0..n-1, we could replace the line l = -1; r = n of the function binarysearch by the three following lines. Discuss this variant of the code.
       if (v[n-1] < x) return n;
       if (x <= v[0]) return 0;  
       // now v[0] < x <= v[n-1]
       l = 0; r = n-1;
    
  2. Answer the following questions about the function binarysearch.  What happens if  while (l < r-1)  is replaced by  while (l < r)?  or by  while (l <= r-1)?  What happens if we replace  (l + r)/2  by  (l + r - 1)/2  or by  (l + r + 1)/2  or by  (r - l)/2?  What happens if  if (v[m] < x)  is replaced by  if (v[m] <= x)?  What happens if  l = m  is replaced by  l = m+1  or by   l = m-1?  What happens if  r = m  is replaced by  r = m+1  or by  r = m-1?
  3. Suppose n = 9 and v[i] = i for each i. Execute binarysearch for various values of x.  Repeat the exercise with n = 15 and various values of x.  Repeat the exercise with n = 14 and x = 9.
  4. Execute the binarysearch function with n = 16. What are the possible values of  m  in the first iteration? What are the possible values of m in the second iteration? What about the third iteration? What about the fourth?
  5. In the function binarysearch, is it true that m is in 0..n-1 every time the instruction if (v[m] < x) is executed?  (Note that l and r may not be in 0..n-1.)
  6. Check the validity of the following claim: when n+1 is a power of 2, the expression (l + r) is divisible by 2 whatever v and x may be.

Is the function binarysearch correct?

To understand the function binarysearch, all we need to do is check that, at the beginning of each repetition of the while, immediately before  l  is compared with  r-1,  the relation

v[l] < x ≤ v[r]

holds. This relation is, therefore, invariant.  (The algorithm was actually built around this relation!)  Note the similarity between the invariant and the goal  v[j-1] < x ≤ v[j]  we are pursuing.  The invariant holds, in particular, at the beginning of the first iteration: just pretend that v[-1] is −∞ and v[n] is +∞.

At the beginning of each iteration, by virtue of the invariant, we have v[l] < v[r] and therefore l < r, since the array is increasing. At the beginning of the last iteration, we have l >= r-1 and therefore l == r-1.  The invariant relation now shows that, by returning r, the function is behaving as promised!

In each iteration we have l < m < r  (why?).  Hence, both  r - m  and  m - l  are strictly smaller than r - l. Therefore, the sequence of values of  r - l  is strictly decreasing. It is for this reason that the execution of the function terminates, sooner or later.

Exercises 4

  1. Proof of invariant.  Show that at the beginning of each iteration of binarysearch (that is, immediately before l is compared with r-1) we have  v[l] < x ≤ v[r].
  2. Variations of the algorithm.  Write a version of the binarysearch function that has the following invariant: at the beginning of each iteration,  v[l-1] < x ≤ v[r].  Repeat with  v[l-1] < x ≤ v[r+1].  Repeat with  v[l] < x ≤ v[r+1].
  3. Large arrays.   If the number of elements of the array is close to INT_MAX, the code of the binary search may derail, by virtue of an arithmetic overflow, when computing  m = (l + r)/2.  How to avoid this?

Performance of binary search

How many iterations the function binarysearch executes when processing an array with n elements?  At the beginning of the first iteration, r - l is approximately n.  At the beginning of the second, it is approximately n/2.  At the beginning of the third, approximately n/4. At the beginning of the k+1-th, approximately n/2k.  When k surpasses log n, the value of n/2k becomes smaller than 1 and the execution of the algorithm stops.  (See the computation of log n in another chapter.)  Hence, the number of iterations is approximately

log n

in the worst case as well as in the best.  The approximate number of comparisons of x with elements of the array is also log n.  Here are some examples when n is power of 2:

n   32 1024 32768 1048576 33554432
log n   5 10 15 20 25

The value of log n grows very slowly since log transforms multiplications into additions.  For example, if a search in an array of size n requires C comparisons, then a search in an array of size 2n will require only C+1 comparisons, a search in an array of size 8n will require only C+3 comparisons, and a search in an array of size 100n will require less than C+7 comparisons.

The time binarysearch consumes is proportional to the number of comparisons between x and elements of v.  Hence, the time consumption is proportional to  log n.

Exercises 5

  1. Suppose the array v has 511 elements and x is not in the array.  How many times, exactly, binarysearch will compare x with an element of the array?  Now suppose that the array v has 50000 elements and x is not in the array.  How many times, approximately, binarysearch will compare x with an element of the array?
  2. If we need t seconds to do a binary search in an array of n numbers, how much time do we need in order to do a search in an array of n2 numbers?
  3. Exact number of iterations.  Show that, in the worst case, binarysearch does exactly 1 + lg (n) comparisons of x with elements of the array, where lg (n) is the floor of log n.

Exercises 6

Implementations of binary search require care and attention to details. It is very easy to write an implementation that gives wrong answers or ends up in an infinite loop.  The exercises below discuss alternative versions of binarysearch that are different from the one examined above. All strive to find x in v[0..n-1]. All would like to produce an index j in 0..n such that v[j-1] < x ≤ v[j].

  1. Show that the following version of binarysearch is correct. It is almost as beautiful and smooth as the version discussed above.
       l = 0; r = n;
       while (l < r) { // v[l-1] < x <= v[r]
          m = (l + r)/2; 
          if (v[m] < x) l = m+1;
          else r = m; 
       } // l == r
       return r;
    
  2. Show that the following version of binarysearch is correct. It is a little less beautiful and smooth than the previous versions.
       l = 0; r = n-1;
       while (l <= r) { // v[l-1] < x <= v[r+1]
          m = (l + r)/2;
          if (v[m] < x) l = m+1;
          else r = m-1; } // l == r+1
       return r+1;
    
  3. Is the following version of binarysearch correct?
       l = -1; r = n-1;
       while (l < r) {
          m = (l + r)/2;
          if (v[m] < x) l = m;
          else r = m-1; } 
       return r+1;
    
  4. Is the following version of binarysearch correct?
       l = -1; r = n-1;
       while (l < r) {
          m = (l + r + 1)/2;
          if (v[m] < x) l = m;
          else r = m-1; }
       return r+1;
    

Exercises 7

  1. ★ Write an implementation of binary search that will receive an integer x and an increasing array v[0..n-1] and will return j such that v[j-1] x < v[j].  What are the possible values of j?
  2. Write an implementation of binary search that looks for x in an increasing array v[0..n] (notice the indices).  Write an implementation of the binary search that looks for x in v[1..n].
  3. Write an implementation of binary search that looks for x in a decreasing array v[0..n-1].
  4. Modify the code of the function binarysearch to solve the search problem stated at the start of this chapter.

Recursive version of binary search

To formulate a recursive version of binary search we must slightly generalize the problem, changing  v[0..n-1]  into  v[a..b].  The bridge between the basic and the generalized formulations is a wrapper function binarysearch2 that outsources the job to a recursive function bsearch.

// This function receives an increasing array
// v[0..n-1] and an integer x and returns an
// index j in 0..n such that
// v[j-1] < x <= v[j].

int 
binarysearch2 (int x, int n, int v[]) {
   return bsearch (x, -1, n, v);
}
// Receives an increasing array v[l+1..r-1]
// and an integer x such that v[l] < x <= v[r]
// and returns an index j in l+1..r such that
// v[j-1] < x <= v[j].

static int 
bsearch (int x, int l, int r, int v[]) {
   if (l == r-1) return r;  
   else {
      int m = (l + r)/2;
      if (v[m] < x)  
         return bsearch (x, m, r, v);
      else  
         return bsearch (x, l, m, v); 
   } 
}

(The keyword static is there to indicate that the function bsearch has an auxiliary nature and will not be called directly by the user of the binary search algorithm.)

What is the depth of the recursion in the function bsearch?  In other words, how many times bsearch calls itself? Answer: approximately log n times.

Exercises 8

  1. Correctness.  Prove that the recursive function bsearch is correct.  Observe that the condition v[l] < x <= v[r] implies l <= r-1.  Also check that each call to bsearch satisfies the conditions stated in the documentation of the function.
  2. Discuss the following alternative version of the function binarysearch2:
    int binarysearch3 (int x, int n, int v[]) {
       if (v[n-1] < x) return n;
       if (x <= v[0]) return 0;
       return bsearch (x, 0, n-1, v); }
    
  3. Discuss the following alternative version of the function bsearch. Write a documentation of the function.
    int bsearch2 (int x, int l, int r, int v[]) {
       if (l > r) return r+1;
       else {
          int m = (l + r)/2;
          if (v[m] < x)  
             return bsearch2 (x, m+1, r, v);
          else  
             return bsearch2 (x, l, m-1, v); } }
    

Exercises 9

  1. Two Sum.  Write an efficient algorithm for the following problem:  given an increasing array v[0..n-1] of integers and an integer number target, find two distinct indices i and j such that  v[i] + v[j] = target.
  2. Write a function to receive a strictly increasing integer array v[0..n-1] and return an index i between 0 and n-1 such that  v[i] == i. (If no such i exists, return -1.)  The function must not do more than log n comparisons involving elements of v.
  3. Write an efficient function to receive strictly positive integers k and n and compute kn.  How many multiplications does your function execute?
  4. Array of structs.  Supose that each element of the array v[0..n-1] is a struct with two fields: the name of a student and the id number of the student. Suppose the array is in increasing order of id numbers. Write a binary search function that receives the number of a student and returns her/his name. If the number is not in the array, the function must return the empty string.
  5. Disjoint intervals.  Imagine a set of pairwise disjoint closed intervals. Suppose the extremes of the intervals are integer numbers. Write a program that will receive an integer and find the interval to which the integer belongs. For example, if the intervals are 1643..2033, 5532..7643, 8999..10332, 5666653..5669321, the number 9122 belongs to the third interval and 8122 belongs to no interval.
  6. Study the function bsearch of the stdlib library.