Arrays

. . . there is a huge difference
between working programs and good programs.
A good program not only works,
but is easy to read and maintain.

— P. A. Darnell, Ph. E. Margolis,
Software Engineering in C

Programs must be written for people to read,
and incidentally for machines to execute.

— H. Abelson, G. Sussman,
The Structure and Interpretation of Computer Programs

Computing walks on three legs:
correctness, efficiency, and elegance.

Imre Simon

An array is a data structure that stores a sequence de objects, all of the same type, in consecutive positions of the RAM (= random access) memory of the computer.  An array allows random access: any element of the array can be reached directly, without going through the previous elements (the tenth element, for example, can be reached without going first through the first, the second, etc., the ninth elements).

How does one search for a certain object in an array?  How does one insert a new object in an array?  How does one delete an element from the array?  These problems will be used as a pretext to show examples of algorithms and to illustrate the concepts of correctness, efficiency, and elegance. The three problems — search, insertion, and deletion — will reappear, in other contexts, in many chapters of this site.

Imagine that we wish to store n integer numbers in an array v.  The space occupied by the array in memory may have been created by the declaration

int v[N];

where N is a constant (possibly defined by a #define directive).  Assuming that the integers are stored in positions 0, 1, … , n-1, we shall say that

v[0..n-1]  is an integer array

or an array of integers. Of course we must have 0 ≤ n ≤ N.  If n == 0, the array v[0..n-1] is empty.  If n == N, the array is full.

Table of contents:

Exercises 1

  1. [Sedgewick]  Say (without using a computer) what will be the contents of the array  v  after the following instructions:
    int v[99];
    for (i = 0; i < 99; ++i) v[i] = 98 - i;
    for (i = 0; i < 99; ++i) v[i] = v[v[i]];
    
  2. Inversion.  Suppose that array v[1..n] contains a permutation of 1..n. Write a function to invert this permutation in the following sense:  if v[i] == j in the original array then v[j] == i after the function has been executed.

The search problem consists of the following: Given an integer x and an array v[0..n-1] of integers, find x in v, that is, find an index k such that v[k] == x.

0   0               n-1    
443   222 555 111 333 444 555 666 888 777 888 999
x   v

A solution of this problem is an algorithm that receives the values of x, nv and returns an index k.  We are immediately faced with a design decision: what shall our algorithm return if there is no k such that v[k] == x?  In other words, what shall our algorithm do if the instance of the problem has no solution?

Our design decision: return -1 if the instance has no solution. This is a good convention since -1 does not belong to the set 0..n-1 of valid indices.  To implement this convention, it is best to scan the array fom the end to the beginning:

// The function receives x, n >= 0 and v and
// returns an index k in 0..n-1 such that
// x == v[k]. If such k does not exist,
// returns -1.

int
find (int x, int n, int v[])
{
   int k;
   k = n-1;
   while (k >= 0 && v[k] != x) 
      k -= 1;
   return k;
}

It is easy to check that the function is correct.  The function is also efficient and elegant: it does not waste time, it has no unnecessary instructions and no unnecessary variables, and it does not handle special cases (like the case n == 0, for example) in separate.

An equally correct, efficient, and elegant code could be written with a for in place of the while:

   for (int k = n-1; k >= 0; --k) 
      if (v[k] == x) return k;
   return -1; 

Bad examples. To make a contrast with the function above, here are some inelegant, inefficient, or incorrect functions. The first, very popular, uses a transit, or signal, boolean variable:

   int found = 0, k = n-1;
   while (k >= 0 && found == 0) { // inelegant
      if (v[k] == x) found = 1;   // inelegant
      else k -= 1;
   }
   return k;

The second, also quite popular, unnecessarily handles the empty array in separate:

   if (n == 0) return -1; // inelegant
   else {
      int k = n-1;
      while (k >= 0 && v[k] != x) k -= 1;
      return k; 
   } 

The next function is inefficient (because it continues running after a solution has been found) and inelegant (because it unnecessarily initializes a variable):

   int k = 0;                 // inelegant
   int sol = -1;
   for (k = n-1; k >= 0; --k) // inefficient
      if (v[k] == x) sol = k; // inelegant
   return sol; 

Some functions seem reasonable at first sight, but are simply wrong. In the following example, the last iteration makes the mistake of accessing v[-1]:

   int k = n-1;
   while (v[k] != x && k >= 0) // wrong!
      k -= 1;
   return k;

(The comparisons should have been done in the correct orderk >= 0 && v[k] != x.)

Sentinel.  We can make a different design decision and return n, instead of -1, in case x is not in v[0..n-1].  In this case, it is better to scan the array from the beginning to the end:

   int k = 0;
   while (k < n && v[k] != x)
      k += 1;
   return k; 

The function is even better if we use a sentinel to avoid the repeated comparisons of k with n:

   v[n] = x;  // sentinel
   int k = 0;
   while (v[k] != x)
      k += 1;
   return k;

(We are assuming here that the array is not full and therefore there is room for the sentinel.)

Exercises 2

  1. What is the invariant of the iterative process in the function find?  [Solution]
  2. Suppose that two or more elements of the array v[0..n-1] are equal to x.  Which will be found by the function find?
  3. How many comparisons between x and elements of v does the function find do?
  4. Criticize the following version of the function find:
    int k = 0;
    while (k < n && v[k] != x) k += 1;
    if (v[k] == x) return k;
    else return -1; 
    
  5. Criticize the following version of the function find:
    int sol;
    for (int k = 0; k < n; ++k) {
       if (v[k] == x) sol = k;
       else sol = -1; 
    }
    return sol;
    
  6. Boolean version.  Write a function that receives x, v, and n and returns 1 if x is in v[0..n-1] and 0 otherwise.
  7. Maximum.  The function below promises to find the value of a largest element of v[0..n-1]. Does the function fulfill its promise?
    int maxi (int n, int v[]) {       
       int m = v[0];
       for (int j = 1; j < n; ++j)
          if (v[j-1] < v[j]) m = v[j];
       return m;
    }
    
  8. Maximum.  The following function promises to calculate the value of a largest element of an array v[0..n-1]. Does the problem make sense when n is 0? Is the function correct?
    int maximum (int n, int v[]) { 
       int x = v[0];
       for (int j = 1; j < n; j += 1)
          if (x < v[j]) x = v[j];
       return x;
    }
    

    Does it make sense to subsitute  x = 0  for  x = v[0],  as some careless programmers do?  Does it make sense to substitute  x = INT_MIN  for  x = v[0]?  Does it make sense to substitute  x <= v[j]  for  x < v[j]?  [Partial solution: ./solutions/array10.html]

  9. Longest null segment.  Write a function to compute the length of a longest segment (i.e, a longest run) of zeros in an array of integers. Try to examine the least possible number of elements of the array.

Here is a recursive solution (see the Recursion and recursive algorithms chapter) of the search problem:

// Receives x, n >= 0 and v and returns k
// such that 0 <= k < n and v[k] == x. 
// If such k does not exist, returns -1.

int rfind (int x, int n, int v[]) {
   if (n == 0) return -1;
   if (x == v[n-1]) return n-1;
   return rfind (x, n-1, v);
}

How does this work?  The number of relevant elements of v is n. If n is 0 then v[0..n-1] is empty and therefore x is not in v[0..n-1]. Now assume that n > 0; in this case, x is in v[0..n-1] if and only if

x is equal to v[n-1]  or  x is in v[0..n-2].

In short: the code reduces the instance that looks for x in v[0..n-1] to the instance that looks for x in v[0..n-2].

Bad example.  The following alternative to the function rfind is very inelegant. It starts the recursion at n == 1, a case that demands some work to solve. (As a side-effect, the function is unable to handle the n == 0 instance of the problem.)

int ugly (int x, int n, int v[]) {
   if (n == 1) {               // inelegant
      if (x == v[0]) return 0; // inelegant
      else return -1;
   }
   if (x == v[n-1]) return n-1;
   return ugly (x, n-1, v);
}

Exercises 3

  1. Suppose that two or more elements of the array v[0..n-1] are equal to x.  Which will be found by the function rfind?
  2. How many comparisons between x and elements of array v will the function rfind do?
  3. Criticize the following variant of the function rfind:
    int rfnd (int x, int n, int v[]) {
       if (n == 0) return -1;
       int k = rfnd (x, n-1, v);
       if (k != -1) return k;
       if (x == v[n-1]) return n-1;
       return -1; 
    }
    
  4. Criticize the following variant of the function rfind:
    int rfnd (int x, int n, int v[]) {
       if (v[n-1] == x) return n-1;
       else return rfnd (x, n-1, v); 
    }
    
  5. Criticize the following variant of the function rfind:
    int rfnd (int x, int n, int v[]) {
       if (v[n-1] == x || n == 0) return n-1;
       else return rfnd (x, n-1, v); 
    }
    
  6. Write a recursive function that will receive an integer x, an array v, and indices i and m and will return an index k in the set i..m-1 such that  v[k] == x.  if such k does not exist, return i-1.

Deletion

Let's say that we wish to delete the element of the array v[0..n-1] whose index is k. To do this, we shall shift the segment v[k+1..n-1] of the array to the range k..n-2.  For example, the result of the deletion of the element at index 3 in the array  0 11 22 33 44 55  will be the array  0 11 22 44 55 .  Of course the problem makes sense if and only if 0 ≤ k < n.

The following function solves the problem and returns the value of the deleted element:

// This function receives an array v[0..n-1]
// and an index k such that 0 <= k < n.
// It deletes the element whose index is k
// and returns the value of that element.

int
delete (int k, int n, int v[])
{
   int x = v[k];
   for (int j = k+1; j < n; ++j)  
      v[j-1] = v[j];
   return x;
}

Everything works well even when k is n-1 and when k is 0.  Of course the deletion will take more time when k is small compared to n.

How to use this function?  For example, to delete the element at index 51 of v[0..n-1] (assuming that 51 < n) it is enough to say

   x = delete (51, n, v);
   n -= 1; // updates the value of n

Recursive version.  It is a good exercise to write a recursive version of the delete function. The size of an instance of the problem is measured by the number n-k and the instance is considered small if n-k is 1. Therefore, the base case of the recursion is the one where k is n-1.

// This function receives an array v[0..n-1]
// and an index k such that 0 <= k < n.
// It deletes the element whose index is k
// and returns the value of that element.

int rdelete (int k, int n, int v[]) {
   int x = v[k];
   if (k < n-1) {
      int y = rdelete (k+1, n, v); 
      v[k] = y;
   }
   return x;
}

Exercises 4

  1. What happens if we change v[j-1] = v[j] into v[j] = v[j+1] in the code of function delete?
  2. Criticize the following versions of the core of the delete function:
    for (int j = n-1; j > k; --j)  
       v[j-1] = v[j];
    
    for (int j = k+1; j < n; ++j)  
       v[j-1] = v[j];
    v[n-1] = 0;
    
    if (k < n - 1) 
       for (int j = k+1; j < n; ++j)  
          v[j-1] = v[j];
    
  3. Write a version of the delete function that will decrement the value of n after the deletion. (Hint: Pass the address of variable n to the function.)
  4. Discuss the following version of rdelete:
    int rdelete2 (int k, int n, int v[]) {
       int x = v[k];
       if (k < n-1) {
          rdelete2 (k, n-1, v);
          v[n-2] = v[n-1]; }
       return x; 
    }
    
  5. Rediscuss the problem of deletion under more general conditions: Suppose that the relevant part of the array is v[i..m-1]; then, to delete element v[k], pull v[k+1..m-1] to the left or push v[i..k-1] to the right, depending on which of the alternatives is faster.  (And do not forget to update the values of i and m.)

Insertion

Given an array v[0..n-1] of numbers, we wish to insert a number x between the elements whose indices are k-1 and k.  This makes sense not only when 1 ≤ k ≤ n-1 but also when k is 0 (insert at the beginning) and when k is n (insert at the end). That is, it makes sense for any k in the set 0..n.

Of course we must not do any insertion if the array is full. Therefore, make sure that n+1 ≤ N before calling the function.

// This function inserts x between the 
// positions k-1 and k of the array v[0..n-1] 
// assuming that 0 <= k <= n.

void
insert (int k, int x, int n, int v[])
{
   for (int j = n; j > k; --j)  
      v[j] = v[j-1];
   v[k] = x;
} 

The function works well even when we wish to insert at the beginning of the array and when we wish to insert at the end.

For exemple, to insert a new element having value 999 between the positions 50 and 51 (assuming 51 ≤ n) all we have to do is

   insert (51, 999, n, v);
   n++;  // updates n

Recursive version.  It is a good exercise to write a recursive version of insert:

// This function inserts x between the 
// positions k-1 and k of the array v[0..n-1] 
// assuming that 0 <= k <= n.

void rinsert (int k, int x, int n, int v[]) {
   if (k == n)  v[n] = x;
   else {
      v[n] = v[n-1];
      rinsert (k, x, n - 1, v);
   }
}

Exercises 5

  1. Write a function that will insert x between positions k and k+1 in an array v[0..n-1]. Write a good documentation of the function.
  2. Write a version of the function insert that will increment the value of n after the insertion. (Hint: Pass the address of variable n to the function.)
  3. Discuss the following version of rinsert:
    int rinsert2 (int k, int x, int n, int v[]) {
       if (k == n) {
          v[n] = x;
          return n + 1; }
       else {
          int y = v[k];
          v[k] = x;
          return rinsert2 (k+1, y, n, v); } 
    }
    
  4. Rediscuss the insertion problem under more general conditions: Suppose that the relevant part of the array v is v[i..m-1]. To insert x between v[k-1] and v[k], you have two options: either push v[k..m-1] to the right or pull v[i..k-1] to the left. Choose the faster option.  (And do not forget to update the values of i and m.)

Search and delete

Now consider a combined search and delete problem. Suppose we wish to delete all the zero elements from the array  v[0 .. n-1].  Of course the problem makes sense with any n0 (the case where n == 0 is trivial).  For exemple, if n is 7 and v[0..6] is

11 22 0 0 33 0 44

then the resulting array must be  11 22 33 44. The function must return the number of elements of the array after the deletion of all zeros.

Here is an elegant and efficient solution of the problem:

// This function deletes all the zero elements
// of v[0..n-1]. It assumes only that n >= 0.
// The function leaves the result in v[0..i-1]
// and returns i.
    
int
deletezeros (int n, int v[])
{
   int i = 0;
   for (int j = 0; j < n; ++j)  
      if (v[j] != 0) 
         v[i++] = v[j];
   return i;
}

(The instruction  v[i++] = v[j]  has the same effect as  v[i] = v[j]; i += 1.)  At the beginning of each iteration, the following invariant holds:  v[0..i-1]  is the array that results from the deletion of all zeros from v[0..j-1];  of course i ≤ j.

0         i     j       n
999 999 999 999 999 000 999 000 999 000 999 000 999
no-zeros version of
original v[0..j-1]
garbage

Everything works well even when n is 0. It also works well when the array has no zeros. It also works well when the array has nothing but zeros.

To delete all zeros from an array v[0..n-1], just say

n = deletezeros (n, v);

Bad examples.  The following version is worse than the one above because it wastes space (by using an auxiliary array of n elements):

int vv[1000], i = 0;
for (int j = 0; j < n; ++j) 
   if (v[j] != 0) vv[i++] = v[j];
for (int j = 0; j < i; ++j)  
   v[j] = vv[j];
return i;

Now a version that is simply wrong (why?):

for (int i = 0; i < n; ++i) 
   if (v[i] == 0) {
      for (int j = i; j+1 < n; ++j)  
         v[j] = v[j+1];
      n -= 1;
   }
return n; 

Next, one more inelegant and inefficient version. It is inefficient because the instruction v[j] = v[j+1] does not copy v[j+1] to its final place and therefore has to be repeated many times:

int i = 0;
while (i < n) {
   if (v[i] != 0)  i += 1;
   else {
      for (int j = i; j+1 < n; ++j)  
         v[j] = v[j+1];
      --n;
   } 
}
return n; 

To better feel the inefficiency of this version, consider the following example. Suppose that n is 100 and that all the elements of v except v[99] are zeros. The version above will copy elements of v from one place to another 4950 times, while the first version of deletezeros will copy elements only once!

Next, and even more inelegant and inefficient version.  Changing the value of i within the for controlled by i is a very bad way to achieve the desired effect.  Moreover, variable j is unnecessarily initialized and the part i < n-1 of the if condition is supefluous.

int j = 0;
for (int i = 0; i < n; ++i) 
   if (i < n-1 && v[i] == 0) {
      for (j = i; j+1 < n; ++j)  
         v[j] = v[j+1];
      --n;
      --i; // bad...
   }
return n; 

The following version is no more efficient than the previous one, but at least it is not so crooked:

for (int i = n-1; i >= 0; --i)
   if (v[i] == 0) {
      for (int j = i; j < n-1; ++j) 
         v[j] = v[j+1];
      --n;
   }
return n;

Recursive version.  Finally, here is a recursive version of deletezeros. Notice how the instruction v[m] = v[n-1] puts v[n-1] in its final place.

// The function rdeletezeros deletes all the
// zero elements of v[0..n-1]. It leaves the
// result in v[0..i-1] and returns i.
    
int rdeletezeros (int n, int v[]) {
   if (n == 0) return 0;
   int m = rdeletezeros (n - 1, v);
   if (v[n-1] == 0) return m;
   v[m] = v[n-1];
   return m + 1;
}

Next, a crooked and inefficient version. Unlike the previous version, it does not immediately put each element of the array in its final place.  Observe also that there are two recursive calls and they have different forms; a bad sign!

// Receives 0 <= k <= n and deletes the zeros
// from v[k..n-1]. The array without the zeros 
// will have the form v[k..m-1]. The function
// returns the value of m.

int rbad (int k, int n, int v[]) {
   if (k == n) return n;
   if (v[k] != 0) 
      return rbad (k+1, n, v);
   for (int i = k; i < n-1; ++i) v[i] = v[i+1];
   return rbad (k, n-1, v);
}

Sometimes, a recursive version of a function requires additional parameters, as we see here with the parameter k. It is essential to explain to the user, as we did above, what is the role of the additional parameter.  (But, of course, not even the best of explanations will fix an ill-conceived function.)

Exercises 6

  1. Criticize the following function. It promises to delete the zeros from v[0..n-1], to leave the result in v[0..m-1], and to return m.
    int delete0 (int n, int v[]) {
       int z = 0;
       for (int i = 0; i < n; ++i) {
          if (v[i] == 0) z += 1;
          v[i-z] = v[i]; }
       return n - z; 
    }
    
  2. Criticize the following function. It promises to delete the zeros from v[0..n-1], to leave the result in v[0..m-1], and to return m.
    int delete0 (int n, int v[]) {
       int z = 0;
       for (int i = 0; i < n - z; ++i) {
          if (v[i] == 0) {
             z += 1;
             for (int k = i; k < n - z; ++k) 
                v[k] = v[k+1];
             --i; } }
       return n - z; 
    }
    
  3. Write a recursive function that will delete all the # characters from an array c[0..n-1] of ASCII characters.  Example: If n is 7 and the array contains   a b c # # d #   then the result must be   a b c d.
  4. ★ Write a function that receives two ASCII strings, say str and del, and deletes from str all the characters that are in del. For example, if str is  aaa*$-bbb#ccc*  and del is  $#*  then the final value of str must be  aaa-bbbccc.  Make your function as efficient as you can. Your function should not allocate any new array.
  5. A small application.  Write a program to manage a collection of numbers typed in by the user. The collection may contain more than one copy of the same number. The user can delete numbers from the collection and insert numbers into the collection. If the user types
    i 222
    

    (followed by ) the number 222 is inserted into the collection.  If the user types

    d 333
    

    one copy of the number 333 is deleted from the collection. (If there is no copy, the command is ignored.)  If the user types any character other than 'i' or 'd', the execution of the program terminates.  After each insertion or deletion the program must display the collection. The collection should be kept in increasing order throughout the process.  [Solution: ./solutions/array2.html]

Exercises 7: challenges

  1. Josephus problem.  Imagine the integers 1 to n arranged in a circle. (Hence, the next number after n is 1.)  Go around the circle starting at 1. As you go around, delete every m-th number until there is only one left. What will be the surviving number?
  2. Constant segment.  Let's say that a segment v[i..j] of an array v[0..n-1] is constant if all its elements have the same value.  Write a simple and efficient function to calcule the length of a longest constant segment of v[0..n-1].
  3. Subsequence.  A subarray of an array v is whatever is left after some of the elements of v have been deleted.  (For example,  12 13 10 3  is a subarray of  11 12 13 11 10 9 7 3 3  but not of  11 12 10 11 13 9 7 3 3.)  Write an efficient function to decide whether x[0..m-1] is a subarray of v[0..n-1].
  4. Longest increasing subsequence.  Write a efficient function to find a longest increasing subarray of a given arrray v[0..n-1].
  5. Max sum segment.  The sum of an array v[i..k] of integers is the number v[i] + v[i+1] + ... + v[k].  The height of an array v[1..n] is the sum of a nonempty segment of v[1..n] having the largest sum.  In other words, the height of v[1..n] is the largest number of the form v[i] + v[i+1] + ... + v[k] with 1 ≤ i ≤ k ≤ n.  (Example: One of the segments of the array  20 -30 15 -10 30 -20 -30 30  has sum 15 - 10 + 30 = 35.  Is there a segment with larger sum?)  Write a function to compute the height of an array v[1..n]. Make the algorithm as efficient as you can.
  6. Sum of two distant elements.  Given an array v[0..n-1] of numbers and an integer d such that 0 ≤ d ≤ n-1, find the largest number of the form v[i] + v[j] with j - i ≥ d.

Questions and answers