Bytes, numbers, and characters

01000010 01111001 01110100 01100101 01110011 00101100 00100000
01101110 01110101 01101101 01100010 01100101 01110010 01110011
00101100 00100000 01100011 01101000 01100001 01110010 01100001
         01100011 01110100 01100101 01110011 01110011 00001010

The memory of any computer is a sequence of bytes.  Each byte is a sequence of 8 bits (binary digits) and therefore has 28 = 256 possible values:

00000000 00000001 00000010 ⋮ 11111110 11111111

A sequence of consecutive bytes in memory can be interpreted in three different ways:

This chapter discusses these three interpretations.

Table of contents:

Natural numbers and binary notation

Every sequence of bits can be seen as a natural number in binary notation:  the number is the sum of the powers of 2 that correspond to the 1 bits.  For example, the sequence  1101  represents the number  23 + 22 + 20,  which is equal to 13.  The sequence  1111  represents  23 + 22 + 21 + 20,  which is equal to 15.

Every sequence of s bytes — that is, 8s bits — represents a natural number in the closed interval

0 . . 28s−1.

If s = 1, for example, the interval goes from 0 to 28−1, i.e., from 0 to 255.  If s = 2, the interval goes up to 216−1, i.e., 65535.  If s = 4, the interval goes up to 232−1, i.e., 4294967295.

Example.  In order to make the example fit on the page, we take s = 1 and pretend that each byte has only 4 bits. A sequence of 4 bits represents, in binary notation, a number in the interval  0 . . 24−1:

byte number
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 8
1001 9
1010 10
1011 11
1100 12
1101 13
1110 14
1111 15

Exercises 1

  1. Show that every natural number can be written in binary notation.
  2. Show that  2k + 2k−1 + … + 21 + 20 = 2k+1−1, for any natural number k.
  3. Write the numbers 28, 28−1, 216, 216−1, 232, and 232−1 in hexadecimal notation.

Integers and two's complement notation

Let s be a nonnull natural number. Every sequence of s bytes — that is, 8s bits — can be interpreted as an integer in the closed interval

−28s−1 . .  28s−1−1 .

If s = 1, for example, this interval goes from −27 to 27−1, i.e., from −128 a 127.  If s = 2, the interval goes from −215 to 215−1, i.e., from −32768 to 32767.  If s = 4, the interval goes from −231 to 231−1, i.e., from −2147483648 to 2147483647.

What integer does a given sequence of 8s bits represent?  Begin by interpreting the sequence as a natural number in binary notation. Let's say that this number is n. If the first bit of the sequence is 0, the sequence represents the positive integer n. If the first bit is 1, the sequence represents the strictly negative integer  n − 28s.  This way of representing integers is known as two's complement notation.

Example.  For the example to fit on the page, we take s = 1 and pretend that each byte has only 4 bits. Any such sequence of bits represents an integer in the interval −23 . . 23−1 :

byte integer
0000 +0
0001 +1
0010 +2
0011 +3
0100 +4
0101 +5
0110 +6
0111 +7
1000 −8
1001 −7
1010 −6
1011 −5
1100 −4
1101 −3
1110 −2
1111 −1

Exercises 2

  1. Complement of n.  We have shown above how the two's complement notation transforms any sequence of s bytes whose first bit is 1 into a negative integer. Now consider the converse operation. Given an integer n in the interval −28s−1 . . −1, show that the sequence of s bytes that represents n in two's complement notation is equal to the sequence of bytes that represents the natural number n + 28s in binary notation.
  2. Two's complement.  The two's complement notation transforms any sequence of s bytes whose first bit is 1 into a negative integer. Now consider the converse operation. Suppose that n is um integer in the interval −28s−1 . . −1. Take the sequence of bits that represents the absolute value of n in binary notation; complement all the bits (that is, change 0s into 1s and vice versa) and add 1, in binary, to the result. Show that this operation produces the sequence of s bits that represents n in two's complement notation.
  3. An alternative for two's complement?  Suppose, as we did in the example above, that we have only 4 bits to represent integers. Now consider the following interpretation of such a sequence of 4 bits. Let n be the positive integer represented by the last three bits in binary notation. If the first bit is 0, then the whole sequence represents the positive integer n. If the first bit is 1, then the whole sequence represents the negative integer n.  (For example, the sequence 1101 represents −5.)  Discuss the disadvantages of representing integers in this way.
  4. Write the numbers 27, 27−1, 215, 215−1, 231, and 231−1 in hexadecimal notation.

Characters and the ASCII table

A character is any typographic symbol (letter, digit, punctuation mark, and so on).  Examples of characters:  @, A, B, C, a, b, c, +, -, *, /, =, £, À, ñ, ó, ≤, ≠ (Do not confuse the idea of character with the char type of the C language.)

In this chapter, we consider only the small set of 128 characters known as the ASCII alphabet. This set includes the characters

   ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 
: ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[ \ ] ^ _ `
a b c d e f g h i j k l m n o p q r s t u v w x y z
{ | } ~

(the first character is a blank space) and a few others.

Every byte whose first bit is 0 represents a character in the ASCII alphabet. The correspondence between bytes and characters is defined by the ASCII table. Here is a small sample of that table:

byte character
00111111 ?
01000000 @
01000001 A
01000010 B
01000011 C
01100001 a
01100010 b
01100011 c
01111110 ~

We use verbal shortcuts to refer to ASCII characters.  For example, rather than saying the character A we can say the character 65, since the byte that corresponds to A in the ASCII table is the representation of 65 in binary notation.

Control characters.  Besides the ninety-five normal characters, the ASCII alphabet contains thirty-three control characters. These characters are not typographic symbols like the others and therefore are indicated by a special notation: a backslash followed by a digit or a letter. Here are the most used control characters:

byte character name
00000000 \0 null character
00001001 \t horizontal tabulation (tab)
00001010 \n end of line (newline)
00001011 \v vertical tabulation
00001100 \f end of page (new page)
00001101 \r carriage return

The character  \0  is used to mark the end of a string and takes no space when displayed; the character  \n  signals the end of a line of text and produces a jump to a new line when displayed; the character  \f  signals the end of a page; and so on.  Though the space (character 32) is not a control character, it can be indicated by (backslash followed by a space).

The characters  \t\n\v\f,  and  \r  are collectively known as white-spaces. Many functions of the standard libraries treat all the white-space characters as if they were spaces.

Non-ASCII characters.  If you only use English, the ASCII alphabet is likely all you need. However, you should be aware that the ASCII alphabet lacks many letters from other languages, for example letters with diacritics such as À, ñ, ó, etc., and special symbols such as £, ≤, ≠, etc.  Each of these characters is represented by two or more consecutive bytes in a coding scheme known as UTF-8.  More about this will be said in chapter Strings and character chains and in chapter Unicode and UTF-8.

Exercises 3

  1. What bytes represent the characters  O,  o,  0  and  \0?
  2. Write the bytes 01000001, 01000010, and 01000011 in hexadecimal notation.
  3. Write, in decimal notation, the sequence of bytes that represents the text  A byte has 8 bits..
  4. Consider the bytes that represent the natural numbers  39 65 39 32 105 115 54 53 10 39 97 39 32 105 115 57 55  in binary notation.  What is the sequence of characters represented by this sequence of bytes?
  5. The epigraph at the top of this page is a sequence of ASCII characters. Decode the epigraph.
  6. Byte inspection.  Study the documentation of the programs od and hexdump (the names are shorthands of octal dump and hexadecimal dump).  These utilities display, byte-by-byte, the contents of any file.

Questions and answers