Programming is best regarded as the process of
creating works of literature,
which are meant to be read.
— Donald E. Knuth,
Literate Programming
Comments are, at best, a necessary evil.
The proper use of comments is to compensate
for our failure to express ourself in code
[…]
Truth can only be found in one place: the code.
— Robert C. Martin,
Clean Code
Use definite, specific, concrete language.
Write with nouns and verbs.
Put the emphatic words at the end.
Omit needless words.
— W. Strunk, Jr. & E.B. White,
The Elements of Style
Some people believe that to document a program you must write many comments interpersed with the code. This is wrong! A good documentation does not pollute the code with comments. A good documentation restricts itself to
explain what each function of the program does.
A good documentation does not waste time trying to explain how the function does what it does, because the interested reader can read the code.
The distinction between what and how is the same as the one between the interface (the .h file) and the implementation (the .c file) of a C library. The following analogy helps to make the difference easier to understand. A deliveries company promises to pick up your package in Amsterdam and deliver it in New York. This is what the company does. How the job will be done — whether the package will travel by ship or by airplane, for example — is an internal matter of the company.
In short, the documentation of a function is a manual that gives complete instructions on the correct use of the function. (In this sense, the idea of documentation coincides with that of an API.) First, the documentation must specify what the function receives and what it returns. Then it must state, very precisely, the effects the function produces, i.e., the relation between what the function receives and what it returns.
A correct documentation is a matter of intellectual honesty,
as it gives the reader/
Table of contents:
Elsewhere in this site there is a function that finds the value of a largest element of an array. Here we repeat that code together with a perfect documentation:
// The following function receives a number
// n >= 1 and an array v and returns the
// value of a largest element of v[0..n-1].
int max (int n, int v[]) {
int x = v[0];
for (int j = 1; j < n; j += 1)
if (x < v[j])
x = v[j];
return x;
}
Notice how simple and precise the documentation is.
It says what the function does
but does not waste time trying to explain how
the function works
(for example, whether the function is recursive or iterative,
whether it scans the array from left to right or from right to left,
and so on).
Notice also the absence of useless comments
(like index j
will traverse the array
and x is the largest up to now
)
polluting the code.
See next some examples of bad documentation of the function. To simply say that
the function returns the value of a largest element of an array
is indecently vague, as it does not mention the parameters n and v of the function! To say that
the function returns the value of a largest element of an array v
is a little better, but still very vague, as it does not state the role of parameter n. To say that
the function returns the value of a largest element of an array v that has n elements
is better, but still vague: the reader does not know whether the array is v[0..n-1] or v[1..n]. To say that
the function returns the value of a largest element of the array v[0..n-1]
is almost good, but withholds the information that the function only makes sense when n ≥ 1.
In another chapter of this site there is a function that decides whether a number x is equal to some element of an array v. Here is the code of that function together with a perfect documentation:
// Receives a number x, an array v, and
// an index n >= 0. Returns 1 if x is in
// v[0..n-1] and returns 0 otherwise.
int find (int x, int n, int v[]) {
int j = 0;
while (j < n && v[j] != x)
j += 1;
if (j < n) return 1;
else return 0;
}
The documentation describes, in a precise and complete manner, what the function does. It does not waste time trying to explain how the function does the job. Contrast this with the following examples of bad documentation. To say that
the function decides whether x is in v[0..n-1]
is a little vague, since the reader is left to guess what the function returns. To say that
the function decides whether x is in the array v
is very vague, as it does not explain the role of the parameter n. To say that
the function decides whether a number is in a given array
is absurdly vague, since it does not even mention the parameters of the function!
Here is a the code of a function accompanied with perfect documentation:
// Receives x, v, and n >= 0 and returns j
// such that 0 <= j < n and v[j] == x.
// If such j does not exist, returns n.
int where (int x, int v[], int n) {
int j = 0;
while (j < n && v[j] != x)
j += 1;
return j;
}
This function receives integers p, q, r, s and returns the average of p, q, r.What is wrong?
This function receives integers p, q, r such that p <= q <= r and returns the average of p, q, r.What is wrong?
1. The documentation of a function has the important role of separating the responsibility of the programmer from that of the user. It is the responsibility of the programmer to say what are the valid values of each parameter of the function; it is the responsibility of the user to check, before calling the function, whether the values of the arguments are valid. With this undestanding, the programmer is freed from the burden of checking the validity of the arguments and can dedicate all his/her attention to the design and coding of the function. (In critical production code, the programmer may, of course, wish to check the validity of the arguments.)
2. In the other chapters of this site, for typographic expediency, the documentation of many functions is not integrated with the code (as a // comment ) but rather embedded in the text that precedes the code.
3. There are excellent tools
for integrating code with documentation.
See, for example, the
CWEB system of literate programming
by Donald Knuth.
As a concrete illustration,
this is what a program looks like
before it is automatically converted into
C code.
There is one situation where comments embedded in the code are useful.
The body of many functions consists of an
iterative process
(controled by a for
or a while
).
In such cases,
you may enrich the documentation by stating
the invariants of the iterative process.
An invariant is a
relation between the values of the variables
that holds at the beginning of each iteration,
i.e., a relation that does not change from one iteration to the next. These invariant relations explain the iterative process and lead to a proof, by induction, of the correctness of the process, i.e., a proof that the process has the desired effect.
Example 1. The function max finds the value of a largest element of v[0..n-1]. The comment embedded in the code states the invariant of the iterative process:
int max (int n, int v[]) {
int x = v[0];
for (int j = 1; j < n; ++j)
// at this point, x is the value of
// a largest element of v[0..j-1]
if (x < v[j])
x = v[j];
return x;
}
Example 2. Let's say that a segment v[i..j] of an array v[0..n-1] is constant if all its elements have the same value. The function scmax receives an array v[0..n-1], with n > 0, and returns the length of a longest constant segment. The comment embedded in the code states the invariant of the iterative process:
int scmax (int v[], int n) { int i = 0, max = 0; while (/*A*/ i < n) { // at the point A, // 1. max is the length of a longest // constant segment of v[0..i-1] and // 2. i == 0 or v[i-1] != v[i] or i == n int j = i+1; while (j < n && v[j] == v[i]) ++j; if (max < j-i) max = j-i; i = j; } return max; }
Invariants are essential to understand why a function, or an algorithm, is correct. Correctness proofs based on invariants figure in the chapters dedicated to binary search, to basic sorting algorithms, to mergesort, to heapsort, and to quicksort.
Invariants do, sometimes, help to choose between two equivalent versions of a piece of code: the more elegant version has simpler invariants.
x is the largest element of the part of the array v examined so far.Criticize this claim.
// the function receives an increasing array // v[0..n-1] with n >= 1 and returns the // length of a longest constant segment // of the array
Answer: No. The extra parentheses are superfluos because the operators < and != take precedence over &&.