Documentation

Programming is best regarded as the process of
creating works of literature, which are meant to be read.

— Donald E. Knuth, Literate Programming

Comments are, at best, a necessary evil.
The proper use of comments is to compensate
for our failure to express ourself in code  […]
Truth can only be found in one place: the code.

— Robert C. Martin, Clean Code

Use definite, specific, concrete language.
Write with nouns and verbs.
Put the emphatic words at the end.
Omit needless words.
— W. Strunk, Jr. & E.B. White, The Elements of Style

Some people believe that to document a program you must write many comments interpersed with the code.  This is wrong!  A good documentation does not pollute the code with comments. A good documentation restricts itself to

explain what each function of the program does.

A good documentation does not waste time trying to explain how the function does what it does, because the interested reader can read the code.

The distinction between what and how is the same as the one between the interface (the .h file) and the implementation (the .c file) of a C library.  The following analogy helps to make the difference easier to understand. A deliveries company promises to pick up your package in Amsterdam and deliver it in New York. This is what the company does. How the job will be done — whether the package will travel by ship or by airplane, for example — is an internal matter of the company.

In short, the documentation of a function is a manual that gives complete instructions on the correct use of the function.  (In this sense, the idea of documentation coincides with that of an API.)  First, the documentation must specify what the function receives and what it returns. Then it must state, very precisely, the effects the function produces, i.e., the relation between what the function receives and what it returns.

A correct documentation is a matter of intellectual honesty, as it gives the reader/user the power to prove that the function is wrong, when that is the case.

Table of contents:

Documentation example

Elsewhere in this site there is a function that finds the value of a largest element of an array. Here we repeat that code together with a perfect documentation:

// The following function receives a number
// n >= 1 and an array v and returns the
// value of a largest element of v[0..n-1].

int max (int n, int v[]) { 
   int x = v[0];
   for (int j = 1; j < n; j += 1)
      if (x < v[j]) 
         x = v[j];
   return x;
}

Notice how simple and precise the documentation is. It says what the function does but does not waste time trying to explain how the function works (for example, whether the function is recursive or iterative, whether it scans the array from left to right or from right to left, and so on).  Notice also the absence of useless comments (like index j will traverse the array and x is the largest up to now) polluting the code.

See next some examples of bad documentation of the function. To simply say that

the function returns the value of a largest
element of an array

is indecently vague, as it does not mention the parameters n and v of the function!  To say that

the function returns the value of a largest
element of an array v

is a little better, but still very vague, as it does not state the role of parameter n.  To say that

the function returns the value of a largest
element of an array v that has n elements

is better, but still vague: the reader does not know whether the array is v[0..n-1] or v[1..n].  To say that

the function returns the value of a largest
element of the array v[0..n-1]

is almost good, but withholds the information that the function only makes sense when n ≥ 1.

Another documentation example

In another chapter of this site there is a function that decides whether a number x is equal to some element of an array v.  Here is the code of that function together with a perfect documentation:

// Receives a number x, an array v, and
// an index n >= 0. Returns 1 if x is in
// v[0..n-1] and returns 0 otherwise.

int find (int x, int n, int v[]) {
   int j = 0;
   while (j < n && v[j] != x)  
      j += 1;
   if (j < n) return 1;
   else return 0; 
}

The documentation describes, in a precise and complete manner, what the function does. It does not waste time trying to explain how the function does the job. Contrast this with the following examples of bad documentation.  To say that

the function decides whether x is in
v[0..n-1] 

is a little vague, since the reader is left to guess what the function returns.  To say that

the function decides whether x is in the
array v 

is very vague, as it does not explain the role of the parameter n.  To say that

the function decides whether a number is
in a given array 

is absurdly vague, since it does not even mention the parameters of the function!

One more example

Here is a the code of a function accompanied with perfect documentation:

// Receives x, v, and n >= 0 and returns j 
// such that 0 <= j < n and v[j] == x.
// If such j does not exist, returns n.

int where (int x, int v[], int n) {
   int j = 0;
   while (j < n && v[j] != x)  
      j += 1;
   return j;
} 

Exercises 1

  1. Consider the following documentation of a function:  This function receives integers p, q, r, s and returns the average of p, q, r.  What is wrong?
  2. Consider the following documentation of a function:  This function receives integers p, q, r such that p <= q <= r and returns the average of p, q, r.  What is wrong?

Notes about documenting a function

1. The documentation of a function has the important role of separating the responsibility of the programmer from that of the user. It is the responsibility of the programmer to say what are the valid values of each parameter of the function; it is the responsibility of the user to check, before calling the function, whether the values of the arguments are valid.  With this undestanding, the programmer is freed from the burden of checking the validity of the arguments and can dedicate all his/her attention to the design and coding of the function.  (In critical production code, the programmer may, of course, wish to check the validity of the arguments.)

2. In the other chapters of this site, for typographic expediency, the documentation of many functions is not integrated with the code (as a  // comment )  but rather embedded in the text that precedes the code.

3. There are excellent tools for integrating code with documentation.  See, for example, the CWEB system of literate programming by Donald Knuth. As a concrete illustration, this is what a program looks like before it is automatically converted into C code.

Invariants of an iterative process

There is one situation where comments embedded in the code are useful.  The body of many functions consists of an iterative process (controled by a for or a while). In such cases, you may enrich the documentation by stating the invariants of the iterative process. An invariant is a relation between the values of the variables

that holds at the beginning of each iteration,

i.e., a relation that does not change from one iteration to the next.  These invariant relations explain the iterative process and lead to a proof, by induction, of the correctness of the process, i.e., a proof that the process has the desired effect.

Example 1.  The function max finds the value of a largest element of v[0..n-1]. The comment embedded in the code states the invariant of the iterative process:

int max (int n, int v[]) { 
   int x = v[0];
   for (int j = 1; j < n; ++j)
      // at this point, x is the value of  
      // a largest element of v[0..j-1]
      if (x < v[j]) 
         x = v[j];
   return x;
}

Example 2.  Let's say that a segment v[i..j] of an array v[0..n-1] is constant if all its elements have the same value.  The function scmax receives an array v[0..n-1], with n > 0, and returns the length of a longest constant segment. The comment embedded in the code states the invariant of the iterative process:

int scmax (int v[], int n) {
   int i = 0, max = 0;
   while (/*A*/ i < n) { // at the point A, 
      // 1. max is the length of a longest
      //    constant segment of v[0..i-1] and
      // 2. i == 0 or v[i-1] != v[i] or i == n
      int j = i+1;
      while (j < n && v[j] == v[i]) ++j;
      if (max < j-i) max = j-i;
      i = j;
   }
   return max;
}

Invariants are essential to understand why a function, or an algorithm, is correct. Correctness proofs based on invariants figure in the chapters dedicated to binary search, to basic sorting algorithms, to mergesort, to heapsort, and to quicksort.

Invariants do, sometimes, help to choose between two equivalent versions of a piece of code: the more elegant version has simpler invariants.

Exercises 2

  1. An inexperienced programmer claims that the following proposition is an invariant of the function max above:  x is the largest element of the part of the array v examined so far.  Criticize this claim.
  2. Is the following documentation of the function scmax correct?
    // the function receives an increasing array 
    // v[0..n-1] with n >= 1 and returns the
    // length of a longest constant segment
    // of the array
    

Questions and answers