Footnotes: Computer science


Instance of a problem

An instance of a problem is a particular case of the problem, with specific values of the parameters of the problem.  Each set of specific values of the parameters defines an instance.

Consider, for example, the problem of finding the average of two numbers, say a and b.  An instance of this problem consists of finding the average of 123 and 9876.  Another instance consists of find the average of 22222 and 3434.


Correct algorithm

An algorithm is correct if it fulfills its promises, that is, if it does what it promises to do.  In other words, an algorithm is correct if it solves the problem that it is supposed to solve.


Elegant algorithm

Elegant is more or less the same as simple, clean, beautiful, economical.  An elegant program has no superfluous code and no unnecessary variables.  An elegant program does not treat special instances of the problem in separate.

A very clever, intricate, convoluted. tangled, obscure, ugly, confused program is inelegant.

Experience shows that elegante programs are, in general, efficient, and efficient programs are, usually, elegant.


Efficient algorithm

An algorithm is efficient if does not waste time.  Roughly speaking, an algorithm is efficient if it is faster than others algorithms for the same problem.

Though this definition is adequate for informal use, it hides some difficulties:

  1. The expression faster than others should include not only the already known algorithms for the problem, but also the ones not yet discovered.
  2. Given two algorithms for the same problem, one can be faster than the other for some instances of the problem and slower for other instances. To characterize efficiency, the larger instances are the most relevant.
  3. Other computational resources besides time (such as memory space, for example) could be taken into account to define efficiency.


Linear, linearithmic, quadratic

The following definitions refer to the time consumption (i.e., the response time) of algorithms.  In all the definitions,  n  is the size of an instance of the problem that the algorithm solves, i.e., the parameter that measures the size of the input to the algorithm.  (If the input is a graph, for example, then n is the the number of vertices plus the number of edges of the graph.)

An algorithm is

  1. constant if its time consumption does not depend on n,
  2. logarithmic if it consumes time proportional to  log n,
  3. linear if it consumes time proportional to  n,
  4. linearithmic (or n-log-n) if it consumes time proportional to  n log n,
  5. quadratic if it consumes time proportional to  n²,
  6. cubic if it consumes time proportional to  n³,
  7. exponencial if it consumes time proportional to  cn, where c is a constant (like 2, for example).

These terms refer, usually, to the time consumption in the worst case.


Data types

A data type is a set of values equipped with a set of operations. Some operations transform values into other values; others associate numbers to values or sets of values.  Here are some basic data types:

You can define your own data types using typedef.  The following data type, for example, is very useful to represent boolean values:

typedef enum {FALSE, TRUE} boolean;

(But this data type is already defined in the stdbool.h interface.)   Another useful data type is  string  (a sequence of bytes):

typedef char *string;

Here is a more specialized example: the data type  point  represents the coordinates of a point on the plane:

typedef struct {
   double x; 
   double y;
} point;

The operations on values of this type are derived from the operations on doubles. You can also add your own operations, like the operation that produces the Euclidean distance between two points, for example.


Real numbers and real numbers

Computers are not aware of real numbers in the mathematical sense. Computers only know a small set of rational numbers.

The world of computing loosely calls reals the numbers of type  float  and  double  (such as 0.31415926×101, for example), and all these numbers are rational.  These numbers are represented in floating point notation.


Priority queue

A priority queue is any data type equipped with two operations:

It is easy to implement a priority queue in such a way than one of the operations is fast.  It is more challenging to come up with an implementation in which both operations are fast.

The definition just given is that of a max priority queue. It is not difficult to adapt the definition to that of a min priority queue.


Heap

Do not mistake the heap data structure with the area of the computer memory used for dynamic memory allocation.  The two concepts are independent.


Binary, octal, and hexadecimal notations

Natural numbers (zero, one, two, three etc.) can be written in binary (base 2) notation, octal (base 8) notation, decimal (base 10) notation, and hexadecimal (base 16) notation.

In binary notation, the digits are 0 and 1.  In octal notation, the digits are 0 1 .. 7.  In decimal, the digits are 0 1 .. 9. And in hexadecimal, the digits are 0 1 .. 9 A B .. F.

For example, the number seventy-nine is represented by  79  in decimal notation since seventy-nine is 7×10 + 9.  The same number is represented by  1001111  in binary notation, since seventy-nine is equal to 1×26 + 1×23 + 1×22 + 1×21 + 1×20.  The same number is represented by  117  in octal notation (since seventy-nine is equal to 1×8² + 1×8¹ + 7), and by  4F  in hexadecimal notation (since seventy-nine is equal to 4×16¹ + F).

The C language uses the following convention:

For example,  0117  is the octal representation and  0x4F  is the hexadecimal representation of seventy-nine.


Letters with diacritical marks

Diacritics are the marks over letters (like á, ã, é, etc.) and the cedilla under a c to indicate accents.

The C99 standard of the C language allows letters with diacritics in names of variables and functions.  But the letters with diacritics must be represented (in the source file of your program) by means of its Unicode numbers. For example, if you wish the name of a variable to be piñacolada, you must type pi\u00F1acolada.  This complication makes the resource somewhat useless in practice…

In the case of character constants, however, the things are more friendly: letters with diacritics can be typed in directly (provided your languages environment is appropriate).  For example,

int pinacolada = 0; // piñacolada
char message[14] = "values in €";
printf( "%s\n", message);

produces

values in €

(Note that the string "values in €" occupies 14 bytes since uses 3 bytes in UTF-8 encoding.)


ISO-LATIN-1 table

Until about 2004, the bytes 10000000 to 11111111 were used to represent letters with diacritics and some others characters.  This correspondence is defined by the ISO-LATIN-1 table, also known as ISO-8859-1 table.

With the popularization of the UTF-8 code, the ISO-LATIN-1 table was deprecated. Hence, the bytes whose first bit is 1 (unsigned chars 128 to 255) no longer represent characters.


The structure of the UTF-8 code

The UTF-8 code represents each character by a sequence of 1 to 4 bytes.  The coding scheme was carefully designed to allow for the unambiguous decoding of a sequence of bytes and to be compatible with the ASCII code.

The first 128 characters on the Unicode list (numbers U+0000 to U+007F) are represented by 1 byte each.  The 1920 next characters (numbers U+0080 to U+07FF) are encoded in 2 bytes.  And so on.

The table below shows the UTF-8 code structure. In the left column, we have the intervals of Unicode numbers, in hexadecimal notation. In the right column, in binary notation, the intervals of the corresponding code bytes:

Unicode numbers      byte 1   byte 2   byte 3   byte 4

00000000 .. 0000007F 0xxxxxxx
00000080 .. 000007FF 110xxxxx 10xxxxxx
00000800 .. 0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
00010000 .. 0010FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Now, the same table written in decimal notation:

       0 .. 127      000..127    
     128 .. 2047     192..223 128..191 
    2048 .. 65535    224..239 128..191 128..191 
   65536 .. 1114111  240..247 128..191 128..191 128..191 

Finally, the same table written in hexadecimal notation:

       0 .. 7F       00..7F    
      80 .. 7FF      C0..DF   80..BF 
     800 .. FFFF     E0..EF   80..BF   80..BF 
   10000 .. 10FFFF   F0..F7   80..BF   80..BF   80..BF


Well-formed expression of parentheses and brackets

First, a very rough draft of the algorithm:

// This function decides whether a string s contains a
// well-formed sequence of parentheses and brackets.

wellformed (string s) {
     for each element c of s do {
          if c ≡ ')' then 
               if top of stack has '(' then
                    pop stack
          else if c ≡ ']' then
               if top of stack has '[' then
                    pop stack
          else push c on stack
     }
     if stack empty 
          then return yes
          else return no
}  

Now, a draft that is closer to actual code:

wellformed (string s) {
     allocate space for a stack of bytes
     for each byte c in s do {
          if c ≡ ')' then 
               if stack nonempty and top of stack is '(' then
                    pop
               else return no
          else if c ≡ ']' then 
               if stack nonempty and top of stack is '[' then
                    pop
               else return no
          else push c
     }
     if stack empty 
          then return yes
          else return no
}  


Transforming infix expression into postfix

Here is a rough pseudocode draft in of the infixToPostfix function:

// This function receives an infix expression inf
// and returns the corresponding postfix expression.

infixToPostfix (string inf) {
     for each character c of inf do {
          depending on the value of c 
               push c onto stack
               or add c to the postfix expression 
               or transfer characters from stack to postfix
     }
     add '\0' to the postfix expression 
     return postfix
}  

Now, a more precise and complete draft:

infixToPostfix (string inf) {
     allocate space for postfix string post
     allocate space for stack of bytes
     // first byte of inf is a '('
     push first byte of inf onto stack
     for each byte c of inf do {
          if c ≡ '(' then 
                push c
          else if c ≡ ')' then 
                while top of stack is different from '('
                      pop x
                      put x in post
                pop x
          else if c is '+' or '-' then
                while top of stack is different from '('
                      pop x
                      put x in post
                push c
          else if c is '*' or '/' then
                  while top of stack is neither '(' nor '+' nor '-'
                      pop x
                      put x in post
                push c
          else
                put c in post
     }
     put '\0' in post
     return post
}  


Distances between cities

Here is um draft of the distances algorithm:

// Receives matrix A of interconnections 
// between 6 cities. Returns array d such that
// d[i] is the distance from city c to city i.

distances (matrix A[0..5][0..5], int c) {
     d[c] = 0
     put c into the queue
  
     while queue is not empty { 
         let i be the first city in the queue
         for each neighbor j of i do
             if d[j] not yet defined then
                 d[j] = d[i] + 1
                 put j into queue
         remove i from queue
     }

     return d
}


Sieve for the Heapsort algorithm

Here is a draft, in pseudocode, of the sieve function:

sieve (array v[1..m]) {
     p = 1
     while 2p ≤ m do 
          let c be the most valuable child of p
          if v[p] ≥ v[c] then stop
          else interchange v[p] and v[c]
          else p = c   // p moves down the heap
}


Heapsort

Here is a draft, in pseudocode, of the heapsort function:

// Rearranges v[1..n] in increasing order.

heapsort (v[1..n]) {
     transform v[1..n] into a heap
     for m = n, n−1, ... , 2 do
          interchange v[1] and v[m]
          sieve (v[1..m-1])
}


Problem solving

Problem-solving skills are almost unanimously
the most important qualification
that employers look for […]
more than programming languages proficiency,
debugging, and system design.

What should I do when faced with a new computational problem?

  1. Do not rush to solve the problem. First, understand very well the nature of the data.  Does the data consist of a list of numbers? a single number? a sequence of characters? a graph? a matrix?  Are the numbers integer or fractional? are they all positive? are they pairwise distinct or there may be repetitions?
  2. Understand very well the nature of the solutions to the problem.  Does a solution consist of a single number? a list of numbers? a matrix? a list of characters? a graph?  Does every instance of the problem have a solution?
  3. Spend all the time you need to understand the relation that must exist between the data and the solutions of the problem.  You must be able to describe, even if only informally, an algorithm that receives the data of the problem together with a candidate solution and decides whether the candidate is indeed a solution.

    To properly understand a problem, write its statement on a piece of papel, draw a diagram, explain the problem to a skeptical colleague.  Try to solve some simple instances of the problem. Try to solve extreme instances, like the very small and the ones that have no solution.

  4. Resist the temptation to begin writing code immediately. Do a little planning before you start to program.  If you start programming too soon you run the risk of not seeing the forest for the trees.

(This note was inspired in the article How to think like a programmer — lessons in problem solving, by Richard Reis.)


Dijkstra and the teaching of computing & programming

Informatica gaat net zo min over computers
als astronomie over telescopen.

Computer science is no more about computers
than astronomy is about telescopes.

— Edsger W. Dijkstra

Edsger W. Dijkstra had strong and original opinions about the teaching of programming and computing. Here are some excerpts from his manuscript EWD1036 (On the cruelty of really teaching computing science):

So, if I look into my foggy crystal ball at the future of computing science education, I overwhelmingly see the depressing picture of 'Business as usual'. The universities will continue to lack the courage to teach hard science, they will continue to misguide the students, and each next stage of infantilization of the curriculum will be hailed as educational progress.

We could, for instance, begin with cleaning up our language by no longer calling a bug a bug but by calling it an error. It is much more honest because it squarely puts the blame where it belongs, viz. with the programmer who made the error. The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be 'almost correct', afterwards a program with an error is just 'wrong'.

The statement that a given program meets a certain specification amounts to a statement about all computations that could take place under control of that given program. And since this set of computations is defined by the given program, [we must] deal with all computations possible under control of a given program by ignoring them and working with the program.  We must learn to work with program texts while (timerarily) ignoring that they admit the interpretation of executable code.

Needless to say that, all by itself, a program is no more than half a conjecture. The other half of the conjecture is the functional specification the program is supposed to satisfy. The programmer's task is to present such complete conjectures as proven theorems.

Right from the beginning, and all through the [introductory programming] course, we [should] stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification.  [...]  Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs. And this concludes the sketch of my proposal for an introductory programming course for freshmen.

E. W. Dijkstra Archive
collection of manuscripts of Edsger W. Dijkstra

See also Dijkstra's interview Discipline in Thought in 2001.


Humor