Previous: 1. An Overview of Computer Hardware and Software Systems
Next: 3. Basic Computer Operations
2 Data Representations in Computers
The contents of computer storage are strings of 1's and 0's. Different types of data, e.g., characters, integers, and real numbers, have different representations in computers. In this chapter, we will give an overview of terminologies, called binary prefixes, that quantify computer storage, length, and time. The numeral system frequently used in a computer is not decimal numerals. The most three numeral systems are binary numerals, octal numerals, and hexadecimal numerals. We will explain theses three numeral systems and data representations of characters, integers, and real numbers. Finally, we describe primitive data types of C programming language.
The smallest unit of data storage in a computer is a bit. A single bit has a binary state which is state of 0 or 1, or a true or false, or two mutually exclusive states. A collection of bits, usually of length 8, is called a byte, which is the basic unit of computer storage. Another term "word" refers to a consecutive bytes with the length of CPU's largest register or largest addressable memory location. For example, for IBM-compatible PC's, a word is equivalent to 16 bits, i.e., 2 bytes, due to early Intel 80x86 processors. However, for many recent CPU's, a word is equivalent to 32 bits or even 64 bits.
There are binary prefixes to quantify large amount of bits, bytes, and words.
Name |
Base 2 |
Base 10 |
Numeral |
Name |
Base 2 |
Base 10 |
Numeral |
kilo |
210 |
»103 |
thousand |
peta |
250 |
»1015 |
quardrillion |
mega |
220 |
»106 |
million |
exa |
260 |
»1018 |
quintillion |
giga |
230 |
»109 |
billion |
zeta |
270 |
»1021 |
sextillion |
tera |
240 |
»1012 |
trillion |
yotta |
280 |
»1024 |
septillion |
The first three names are mostly used in computer science. The term kilo-bytes means 210 bytes, i.e., 1,024 bytes, about 1,000 bytes; the term mega-bytes means 220 bytes, i.e., 1,048,576 bytes, about 1,000,000 bytes; the term giga-bytes means 230 bytes, i.e., 1,073,741,824 bytes, about 1,000,000,000 bytes. The three terms are usually written in short as KB, MB, and GB, respectively. Hence, a 1.8 KB file means that the file is about of the size 1,800 bytes; a 128 MB RAM means that the memory is about of the size 128,000,000 bytes, a 40 GB hard disk means that the disk is about of the size 40,000,000,000 bytes.
There are also terminologies for quantifying small seconds of time, length, and weight. These terms are expressed in base 10.
Name |
Base 10 |
Numeral |
Name |
Base 10 |
Numeral |
milli |
10-3 |
thousandth |
nano |
10-9 |
billionth |
micro |
10-6 |
millionth |
pico |
10-12 |
trillionth |
One millisecond, 1 ms, is one thousandth of a second, i.e., 10-3 seconds. One microsecond, 1 ms, is on millionth of a second, i.e., 10-6 seconds. One nanosecond, 1 ns, is one billionth of a second, i.e., 10-9 seconds.
Several terminologies are frequently used in computer science. We give some examples below:
Description |
Notation |
Meaning |
network bandwidth |
128 kbps |
128 kilo bits per second |
frequency of CPU clock |
600 MHz |
600
mega hertz = 600´106
clock cycles per second; |
microprocessor speed (in terms of instruction) |
600 MIPS |
600 mega (million) instructions per second |
microprocessor speed (in terms of operation) |
1.4 GFlops |
1.4 giga (billion) floating point operations per second |
2.2 Binary Numerals, Octal Numerals, and Hexadecimal Numerals
The numeral system we use daily is decimal numerals, i.e., base 10 numbers. In the decimal number system, there are ten symbols, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, with decimal points and plus (+) and minus (-) signs to express decimal numbers.
Three other numeral systems are used in computer science, binary numerals (base 2), octal numerals (base 8), and hexadecimal numerals (base 16), For binary numerals, there are only two digit symbols 0 and 1. Octal numerals are composed of 8 digit symbols, 0, 1, 2, 3, 4, 5, 6, and 7. Hexadecimal numerals are composed of 16 digit symbols, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. The English digits, A, B, C, D, E, and F, of hexadecimal numerals, denote decimal values 10, 11, 12, 13, 14, and 15, respectively. The following table shows the equivalent values of decimal number, binary numbers, octal numbers, and hexadecimal number from 010 to 1510.
Dec. |
Bin. |
Oct. |
Hex. |
Dec. |
Bin. |
Oct. |
Hex. |
0 |
0000 |
00 |
0 |
8 |
1000 |
10 |
8 |
1 |
0001 |
01 |
1 |
9 |
1001 |
11 |
9 |
2 |
0010 |
02 |
2 |
10 |
1010 |
12 |
A |
3 |
0011 |
03 |
3 |
11 |
1011 |
13 |
B |
4 |
0100 |
04 |
4 |
12 |
1100 |
14 |
C |
5 |
0101 |
05 |
5 |
13 |
1101 |
15 |
D |
6 |
0110 |
06 |
6 |
14 |
1110 |
16 |
E |
7 |
0111 |
07 |
7 |
15 |
1111 |
17 |
F |
A decimal number, say 25084, is actually an expression of powers of 10, i.e.,
25084=2´104+5´103+0´102+8´101+4´100.
Similarly, a binary number, say 1011012 is actually an expression of powers of 2, i.e.,
1011012=1´25+0´24+1´23+1´22+0´21+1´20=45.
Note that when there is a chance of ambiguity, a subscript is used to indicate the base of a number. Often, terms with coefficient 0 are omitted and 1011012 is written as:
1011012=1´25+1´23+1´22+1´20.
An octal number, say 7638, is an expression of powers of 8, i.e.,
7638=7´82+6´81+3´80=499.
A hexadecimal number, say 9E4B16, is an expression of powers of 16, i.e.,
9E4B16=9´163+14´162+4´161+11´160=40523.
For a number with a fixed length of digits, leading zeros will be added if the length of this number is shorter than the given fixed length. For example, if we designate a binary number of 8 bits, then 1011012 is often written as 001011012. We show examples of converting a number from a numeral system to the other numeral systems.
Example 1: Convert decimal number 40206 to a binary number, an octal number, and a hexadecimal number.
The stepwise conversion to a binary number is illustrated as below:
Input: decimal number d = 40206. Output: binary number b of the same value as d.
|
|
Operational steps: |
Initial: (d, b) = (40206, ) |
40206 ¸ 2 = 20103 ×××××× 0 | (20103, 0) |
20103 ¸ 2 = 10051 ×××××× 1 | (10051, 10) |
10051 ¸ 2 = 5025 ×××××× 1 | (5025, 110) |
5025 ¸ 2 = 2512 ×××××× 1 | (2512, 1110) |
2512 ¸ 2 = 1256 ×××××× 0 | (1256, 01110) |
1256 ¸ 2 = 628 ×××××× 0 | (628, 001110) |
628 ¸ 2 = 314 ×××××× 0 | (314, 0001110) |
314 ¸ 2 = 157 ×××××× 0 | (157, 00001110) |
157 ¸ 2 = 78 ×××××× 1 | (78, 100001110) |
78 ¸ 2 = 39 ×××××× 0 | (39, 0100001110) |
39 ¸ 2 = 19 ×××××× 1 | (19, 10100001110) |
19 ¸ 2 = 9 ×××××× 1 | (9, 110100001110) |
9 ¸ 2 = 4 ×××××× 1 | (4, 1110100001110) |
4 ¸ 2 = 2 ×××××× 0 | (2, 01110100001110) |
2 ¸ 2 = 1 ×××××× 0 | (1, 001110100001110) |
\ 4020610=1001 1101 0000 11102 |
The stepwise conversion to an octal number is illustrated as below:
Input: decimal number d = 40206. Output: octal number o of the same value as d.
|
|
Operational steps: |
Initial: (d, o) = (40206, ) |
40206 ¸ 8 = 5025 ×××××× 6 |
(5025, 6) |
5025 ¸ 8 = 628 ×××××× 1 |
(628, 16) |
628 ¸ 8 = 78 ×××××× 4 |
(78, 416) |
78 ¸ 8 = 9 ×××××× 6 |
(9, 6416) |
9 ¸ 8 = 1 ×××××× 1 |
(1, 16416) |
\ 4020610=116 4168 |
The stepwise conversion to a hexadecimal number is illustrated as below:
Input: decimal number d = 40206. Output: hexadecimal number h of the same value as d.
|
|
Operational steps: |
Initial: (d, h) = (40206, ) |
40206 ¸ 16 = 2512 ×××××× 14 |
(2512, E) |
2512 ¸ 16 = 157 ×××××× 0 |
(157, 0E) |
157 ¸ 16 = 9 ×××××× 13 |
(9, D0E) |
\ 4020610=9D0E16 |
If the binary number 1001110100001110 is grouped into 3 bits each from left to right, 1 001 110 100 001 110, it is easy to convert each group of three bits to an octal digit and obtain 1164168. Similarly, if the binary number 1001110100001110 is grouped into 4 bits each from left to right, 1001 1101 0000 1110, it is easy to convert each group of four bits to a hexadecimal digit and obtain 9D0E16. Conversely, it is possible to convert a decimal into an octal or a hexadecimal number and then convert each octal digit or hexadecimal digit into binary bits. This procedure requires fewer steps to convert a large decimal number to a binary one.
Similarly to decimal arithmetic operations, we can add and multiply a pair of binary numbers, octal numbers, and hexadecimal numbers. Addition of two binary numbers, 10010010+00111011, is shown as the following:
carry |
1 | 1 | 1 | |||||||
1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | |||
+ | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | ||
1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
Addition of two octal numbers, 2614+1356, is similar:
carry |
1 | 1 | ||||
2 | 6 | 1 | 4 | |||
+ | 1 | 3 | 5 | 6 | ||
4 | 1 | 7 | 2 |
Let us examine the addition operation 26148+13568 = 142010 + 75010 = 217010 = 41728. We leave addition of two hexadecimal numbers as exercise.
Multiplication of two binary numbers, 10110´1010, is shown as the following:
|
1 | 0 | 1 | 1 | 0 | |||||
´ | 1 | 0 | 1 | 0 | ||||||
0 | 0 | 0 | 0 | 0 | ||||||
1 | 0 | 1 | 1 | 0 | ||||||
0 | 0 | 0 | 0 | 0 | ||||||
1 | 0 | 1 | 1 | 0 | ||||||
1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
Multiplication of octal numbers, we first consider products of two single octal digits, e.g., 58´78=438, 68´78=528, and 38´78=258, and products of multiple digits and a single digit, e.g., 5638´78=43008+5208+258=50458. Multiplication of two octal numbers 563´247 is illustrated as below:
|
5 | 6 | 3 | |||||
´ | 2 | 4 | 7 | |||||
5 | 0 | 4 | 5 | |||||
2 | 7 | 1 | 4 | |||||
1 | 3 | 4 | 6 | |||||
1 | 7 | 1 | 0 | 0 | 5 |
Binary numbers, octal numbers, and hexadecimal numbers with decimal points are similar to that of decimal numbers. For example, binary fractional number 11.101 is converted to a decimal fractional number as below:
11.1012=1´21+1´20+1´2-1+0´2-2+1´2-3=2+1+0.5+0.125=3.62510.
The conversion of a decimal fractional number to a binary number is done in two steps. First, convert the integer part, and then convert the fractional part. For example, the conversion of 3.62510 to a binary fractional number is done by first converting the integer part 310 to to 112 as before. The second step is to convert the fractional part 0.62510 to a binary fraction. The fraction conversion is described and illustrated below.
Input: decimal fraction d = 0.625. Output: binary fraction b of the same value as d.
|
|
Operational steps: |
Initial: (d, b) = (0.625, 0.) |
0.625 ´ 2 = 1.25 |
(0.25, 0.1) |
0.25 ´ 2 = 0.5 |
(0.5, 0.10) |
0.5 ´ 2 = 1.0 |
(0.0, 0.101) |
\ 0.62510=0.1012 |
Therefore, decimal number 3.62510 is converted to 11.1012. However, conversion of decimal fractions to binary fractions is not always perfect. Let us try to convert 0.810 to a binary fraction.
Operational steps: |
Initial: (d, b) = (0.8, 0.) |
0.8 ´ 2 = 1.6 |
(0.6, 0.1) |
0.6 ´ 2 = 1.2 |
(0.2, 0.11) |
0.2 ´ 2 = 0.4 |
(0.4, 0.110) |
0.4 ´ 2 = 0.8 |
(0.8, 0.1100) |
0.8 ´ 2 = 1.6 |
(0.6, 0.11001) |
0.6 ´ 2 = 1.2 |
(0.2, 0.110011) |
0.2 ´ 2 = 0.4 |
(0.4, 0.1100110) |
0.4 ´ 2 = 0.8 |
(0.8, 0.11001100) |
¼ | ¼ |
0.4 ´ 2 = 0.8 | (0.8, 0.110011001100¼1100) |
\ 0.810=0.110011001100¼2 (a cyclic binary fraction) |
Practically, a decimal fraction is converted to a binary fraction of its approximation with a specific length. For example, if the binary fraction is fixed to 8 bits, 0.810 is converted to 0.110011002 which is actually 0.79687510, an approximation of 0.810.
Binary numbers with the negative sign are expressed in a specific number of bits, too. We will consider 8-bit signed binary numbers. A positive binary number, say, 3710, is expressed as 001001012. There are two methods for representing negative binary numbers. The first representation is one's complement which is the bitwise NOT of a binary number, e.g., -3710 = 001001012 = 110110102. For 8-bit signed numbers using one's complement, only integers ranges from -12710 (100000002) to 12710 (011111112) can be expressed. For example, 32010 cannot be expressed as an 8-bit signed binary number, since 32010 = 1 0100 00002 which requires at least 9 bits. In one's complement, the most significant bit is used as the sign bit. If the most significant bit is 0, the binary number is a positive number; if the most significant bit is 1, the binary number is a negative number. One's complement has a drawback of two zero's, 000000002 and 111111112.
The second representation of signed binary numbers is two's complement. For an 8-bit binary number, the most significant bit is used as the sign bit. The representation of a negative binary number using two's complement is taking the one's complement of its absolute value then adding one to it. For example, -3710 = 001001012 + 12 = 110110102 + 12 = 110110112. For 8-bit signed numbers using two's complement ranges from -12810 (100000002) to 12710 (011111112). Note that there is no 12810 of 8-bit two's complement binary numbers.
Two's complement representation is suitable for binary number arithmetic. Consider 8-bit signed binary numbers. Addition of two positive numbers is the same as the previous case 2710 + 1210 = 000110112 + 000011002 = 001001112. Additions of negative numbers are illustrated as below:
carry |
1 | 1 | 1 | 1 | ||||||
0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
(2710) |
||
+ | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
(-1210=000011002+12=111100112+12=111101002) |
|
0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
(1510) |
carry |
1 | 1 | ||||||||
1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
(-2710=000110112+12=111001002+12=111001012) |
||
+ | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
(1210) |
|
1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
(-1510=000011112+12=111100002+12=111100012) |
carry |
1 | 1 | 1 | 1 | ||||||
1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
(-2710) |
||
+ | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
(-1210) |
|
1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
(-3910=001001112+12=110110002+12=110110012) |
Subtraction of two positive numbers is first converting the second operand using two's complement than adding the two numbers. For example,
2710-1210=2710+(-1210)=000110112+111101002=000011112=1510.
However, the following addition results in an error of overflow. It is considered as an invalid operation.
carry |
1 | 1 | ||||||||
0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 |
(7510) |
||
+ | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
(8210) |
|
1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
(Overflow! 100111012 is a negative number -9910.) |
Multiplication of negative numbers is to convert negative operand(s) to its/their absolute values, to calculate the product of two positive numbers, and then to convert the product back to its two's complement representation if only one of the operands is negative.
Characters are represented in a computer as a bit string. The most popular encoding scheme of English characters is ASCII encoding (American Standard Code for Information Interchange). Each character of ASCII code occupies one byte, but only seven bits are used. The most significant bit of ASCII code is always 0. The character set ASCII code is divided into two categories: control characters (in green color) and printable characters (in blue color). The printable characters range from decimal values 32 to 126. The table of ASCII code is shown as below.
Dec. |
Oct. |
Hex. |
Bin. |
Character |
Description |
0 |
000 |
00 |
00000000 |
NUL |
Null character |
1 |
001 |
01 |
00000001 |
SOH |
Start of header |
2 |
002 |
02 |
00000010 |
STX |
Start of text |
3 |
003 |
03 |
00000011 |
ETX |
End of text |
4 |
004 |
04 |
00000100 |
EOT |
End of transmission |
5 |
005 |
05 |
00000101 |
ENQ |
Enquiry |
6 |
006 |
06 |
00000110 |
ACK |
Acknowledgement |
7 |
007 |
07 |
00000111 |
BEL |
Bell |
8 |
010 |
08 |
00001000 |
BS |
Backspace |
9 |
011 |
09 |
00001001 |
HT |
Horizontal tab |
10 |
012 |
0A |
00001010 |
LF |
Line feed |
11 |
013 |
0B |
00001011 |
VT |
Vertical tab |
12 |
014 |
0C |
00001100 |
FF |
Form feed |
13 |
015 |
0D |
00001101 |
CR |
Carriage return |
14 |
016 |
0E |
00001110 |
SO |
Shift out |
15 |
017 |
0F |
00001111 |
SI |
Shift In |
16 |
020 |
10 |
00010000 |
DLE |
Data link escape |
17 |
021 |
11 |
00010001 |
DC1 |
Device control 1 (XON) |
18 |
022 |
12 |
00010010 |
DC2 |
Device control 2 |
19 |
023 |
13 |
00010011 |
DC3 |
Device control 3 |
20 |
024 |
14 |
00010100 |
DC4 |
Device control 4 (XOFF) |
21 |
025 |
15 |
00010101 |
NAK |
Negative acknowledgement |
22 |
026 |
16 |
00010110 |
SYN |
Synchronous idle |
23 |
027 |
17 |
00010111 |
ETB |
End of transfer block |
24 |
030 |
18 |
00011000 |
CAN |
Cancel |
25 |
031 |
19 |
00011001 |
EM |
End of medium |
26 |
032 |
1A |
00011010 |
SUB |
Substitute |
27 |
033 |
1B |
00011011 |
ESC |
Escape |
28 |
034 |
1C |
00011100 |
FS |
File separator |
29 |
035 |
1D |
00011101 |
GS |
Group separator |
30 |
036 |
1E |
00011110 |
RS |
Return to send; record separator |
31 |
037 |
1F |
00011111 |
US |
Unit separator |
32 |
040 |
20 |
00100000 |
SP |
Space |
33 |
041 |
21 |
00100001 |
! |
|
34 |
042 |
22 |
00100010 |
" |
|
35 |
043 |
23 |
00100011 |
# |
|
36 |
044 |
24 |
00100100 |
$ |
|
37 |
045 |
25 |
00100101 |
% |
|
38 |
046 |
26 |
00100110 |
& |
|
39 |
047 |
27 |
00100111 |
' |
|
40 |
050 |
28 |
00101000 |
( |
|
41 |
051 |
29 |
00101001 |
) |
|
42 |
052 |
2A |
00101010 |
* |
|
43 |
053 |
2B |
00101011 |
+ |
|
44 |
054 |
2C |
00101100 |
, |
|
45 |
055 |
2D |
00101101 |
- |
|
46 |
056 |
2E |
00101110 |
. |
|
47 |
057 |
2F |
00101111 |
/ |
|
48 |
060 |
30 |
00110000 |
0 |
|
49 |
061 |
31 |
00110001 |
1 |
|
50 |
062 |
32 |
00110010 |
2 |
|
51 |
063 |
33 |
00110011 |
3 |
|
52 |
064 |
34 |
00110100 |
4 |
|
53 |
065 |
35 |
00110101 |
5 |
|
54 |
066 |
36 |
00110110 |
6 |
|
55 |
067 |
37 |
00110111 |
7 |
|
56 |
070 |
38 |
00111000 |
8 |
|
57 |
071 |
39 |
00111001 |
9 |
|
58 |
072 |
3A |
00111010 |
: |
|
59 |
073 |
3B |
00111011 |
; |
|
60 |
074 |
3C |
00111100 |
< |
|
61 |
075 |
3D |
00111101 |
= |
|
62 |
076 |
3E |
00111110 |
> |
|
63 |
077 |
3F |
00111111 |
? |
|
64 |
100 |
40 |
01000000 |
@ |
|
65 |
101 |
41 |
01000001 |
A |
|
66 |
102 |
42 |
01000010 |
B |
|
67 |
103 |
43 |
01000011 |
C |
|
68 |
104 |
44 |
01000100 |
D |
|
69 |
105 |
45 |
01000101 |
E |
|
70 |
106 |
46 |
01000110 |
F |
|
71 |
107 |
47 |
01000111 |
G |
|
72 |
110 |
48 |
01001000 |
H |
|
73 |
111 |
49 |
01001001 |
I |
|
74 |
112 |
4A |
01001010 |
J |
|
75 |
113 |
4B |
01001011 |
K |
|
76 |
114 |
4C |
01001100 |
L |
|
77 |
115 |
4D |
01001101 |
M |
|
78 |
116 |
4E |
01001110 |
N |
|
79 |
117 |
4F |
01001111 |
O |
|
80 |
120 |
50 |
01010000 |
P |
|
81 |
121 |
51 |
01010001 |
Q |
|
82 |
122 |
52 |
01010010 |
R |
|
83 |
123 |
53 |
01010011 |
S |
|
84 |
124 |
54 |
01010100 |
T |
|
85 |
125 |
55 |
01010101 |
U |
|
86 |
126 |
56 |
01010110 |
V |
|
87 |
127 |
57 |
01010111 |
W |
|
88 |
130 |
58 |
01011000 |
X |
|
89 |
131 |
59 |
01011001 |
Y |
|
90 |
132 |
5A |
01011010 |
Z |
|
91 |
133 |
5B |
01011011 |
[ |
|
92 |
134 |
5C |
01011100 |
\ |
|
93 |
135 |
5D |
01011101 |
] |
|
94 |
136 |
5E |
01011110 |
^ |
|
95 |
137 |
5F |
01011111 |
_ |
|
96 |
140 |
60 |
01100000 |
` |
|
97 |
141 |
61 |
01100001 |
a |
|
98 |
142 |
62 |
01100010 |
b |
|
99 |
123 |
63 |
01100011 |
c |
|
100 |
144 |
64 |
01100100 |
d |
|
101 |
145 |
65 |
01100101 |
e |
|
102 |
146 |
66 |
01100110 |
f |
|
103 |
147 |
67 |
01100111 |
g |
|
104 |
150 |
68 |
01101000 |
h |
|
105 |
151 |
69 |
01101001 |
i |
|
106 |
152 |
6A |
01101010 |
j |
|
107 |
153 |
6B |
01101011 |
k |
|
108 |
154 |
6C |
01101100 |
l |
|
109 |
155 |
6D |
01101101 |
m |
|
110 |
156 |
6E |
01101110 |
n |
|
111 |
157 |
6F |
01101111 |
o |
|
112 |
160 |
70 |
01110000 |
p |
|
113 |
161 |
71 |
01110001 |
q |
|
114 |
162 |
72 |
01110010 |
r |
|
115 |
163 |
73 |
01110011 |
s |
|
116 |
164 |
74 |
01110100 |
t |
|
117 |
165 |
75 |
01110101 |
u |
|
118 |
166 |
76 |
01110110 |
v |
|
119 |
167 |
77 |
01110111 |
w |
|
120 |
170 |
78 |
01111000 |
x |
|
121 |
171 |
79 |
01111001 |
y |
|
122 |
172 |
7A |
01111010 |
z |
|
123 |
173 |
7B |
01111011 |
{ |
|
124 |
174 |
7C |
01111100 |
| |
|
125 |
175 |
7D |
01111101 |
} |
|
126 |
176 |
7E |
01111110 |
~ |
|
127 |
177 |
7F |
01111111 |
DEL |
Delete |
A computer does not process only English text that it must be able to deal with text in other languages such as Chinese, Japanese, and Spanish. To accommodate other languages, various character encoding schemes have been developed. For examples, two Chinese character encoding schemes are used in different area, Big5 for original Chinese characters, GB for simplified Chinese characters. Big5 and GB are all 16-bit encoding schemes.
Unicode is an international standard of character encoding of all human written languages developed by International Organization for Standardization (ISO). Unicode has been adopted by many computer systems to support multilingual environments, including Microsoft windows such as Windows NT, Windows 2000, Windows XP, Unix based operating system such as GNU/Linux, BSD, Mac OS X. Unicode is designed as both 8-bit, UTF-8, and 16-bit, UTF-16, coding schemes for different languages.
In mathematics, the set of integers have infinite number of elements. However, a computer has only finite resource, a given size of memory storage, that it is not possible to represent all integers in a computer. There are two basic types of integer representations: unsigned and signed integers.
Unsigned integers are non-negative integers and are represented in their binary values with a fixed number of bits. For Intel 80x86 processors, a word is 16 bit long and an integer is represented as a 16-bit binary numbers. However, the 16-bit string is too long to read. For easy reading, a 16-bit binary number is usually written as a four-digit hexadecimal number. For example, 5876 is represented as binary number 0001 0110 1111 0100 and is written as hexadecimal number 16F4. Another example, 39741 is represented as binary number 1001 1011 0011 1101 and hexadecimal number 9B3D. Note that the most significant bit of 39741 is one, but it is a positive integer. For 16-bit unsigned integers, the values range from 0 to 216-1 (6553510).
Signed integers are zero, negative, and positive integers and are represented using two's complement. For a 16-bit processor, the most significant bit is used as the signed bit. The values a signed integer range from -215 (-3276810) to 215-1 (3276710). Usually, the representation of a signed integer is expressed in hexadecimal numbers for easy reading. Positive integer 587610 is still represented as 16F416. However, 3974110 is too large to be represented as a 16-bit signed integer; its corresponding hexadecimal number 9B3D16 is indeed the two's complement of signed negative integer -2579510.
Real numbers are represented in a computer using two method: fixed-point format and floating-point format. In a computer, it is not possible to represent a real number precisely. For example, 1/3 is 0.3333.... that cannot be exactly stored in a finite length of memory storage. Both fixed-point numbers and floating-point numbers are only approximation of real numbers.
The representation of fixed-point numbers uses a specific number of bits that part of the bits are used to store the integer part and the rest of the bits are used to store the fractional part after the decimal point. Consider 16-bit representation of fixed-point numbers that the higher eight bits are the integer part and the lower eight bits are the fractional part. For example, fixed-point number 0100 0011 1001 0100 is the representation of
01000011.100101002 = 26 + 21 + 20 + 2-1 + 2-4 + 2-6 = 64 + 2 + 1 + 0.5 + 0.0625 + 0.015625 = 67.57812510 |
For signed fixed-point numbers, a bit is used to represent the plus and minus sign. In the case of 16-bit processors, only 15 bits are left to store the integer and fractional parts, e.g., 7 bits for the integer part and 8 bits for the fractional part.. To express a decimal numbers in the fixed-point format, the decimal real number is converted to a binary number with decimal points with the specified bit length. The decimal real number -18.62510 is converted to -10010.1012 and its fixed-point representation is 1 0010010 10100000.
A floating-point number contains three parts: a sign, an exponent, and a fraction. Floating point numbers are written in the scientific notation: ±fractionEexponent. For examples, 1.432456E5 means 1.432456´105=143245.6 and 3.214065E-3 means 3.214065´10-3=0.003214065.
In a computer, floating point number representation uses scientific notation of the binary number system. A popular standard of floating-point number representation is IEEE 754. The 32-bit single precision floating-point format of IEEE 754 is shown as below:
31 | 30 | 23 | 22 | 0 | |||||||||||||||||||||||||||
s | exponent | fraction | |||||||||||||||||||||||||||||
s | e7 | e6 | e5 | e4 | e3 | e2 | e1 | e0 | f22 | f21 | f20 | f19 | f18 | f17 | f16 | f15 | f14 | f13 | f12 | f11 | f10 | f9 | f8 | f7 | f6 | f5 | f4 | f3 | f2 | f1 | f0 |
where the most significant bit (bit 31) denotes the sign bit, bits 23 to 31 are exponent bits, and bits 0 to 22 are fraction bits. These three parts have the following meanings:
The sign bit is either 0 or 1 that denotes positive or negative floating point numbers, respectively.
The 8-bit exponent (bits 23 to 31) has values ranged from 1 to 254 (0 and 255 are not used) with bias 127, i.e., the actual exponent value is the value of the 8-bit exponent e7e6e5e4e3e2e1e0 subtracted by 127.
The 23-bit fraction (bits 0 to 22) is the value of a binary fraction, i.e., the fractional value has the integer part of value 1 and the fractional part is f22f21f20f19f18f17f16f15f14f13f12f11f10f9f8f7f6f5f4f3f2f1f0.
Let sing(s) be the plus sign (+), if the sign bit is 0, and be the minus sign (-), if the sign bit is 1. Let e be the binary number e7e6e5e4e3e2e1e0. Also, let f be the binary fractional number 0.f22f21f20f19f18f17f16f15f14f13f12f11f10f9f8f7f6f5f4f3f2f1f0. The floating point number of 32-bit IEEE 754 standard is calculated as
sign(s) (1 + f) ´ 2e - 127.
The value of 0 01111011 01100000000000000000000 is
1.0112 ´ 2123-127 = 1.0112 ´ 2-4 = 0.00010112 = 0.085937510,
where the exponent 011110112 equals to 12310. The value of 1 10000101 10110000000000000000000 is
-1.10112 ´ 2133-127 = -1.10112 ´ 26 = -11011002 = -10810,
where the exponent 100001012 equals to 13310.
To express a decimal numbers in the floating-point format, the decimal number is converted to a binary number with decimal points. Then calculate the exponent part and the fraction part. Consider the decimal number -18.62510.
-18.62510 = -10010.1012 = -1.00101012 ´ 24.
We obtain the results that the sign bit is 1, the exponents is 4 +127 = 13110 = 100000112, and the fraction is 0010101. The floating-point representation of -18.625 is written as 1 10000011 00101010000000000000000. If both the exponent part and the fraction part are 0, the floating point number is 0.
A C program contains a number of variables each of which represents a memory location. (Issues on program variables will be discussed in Section 3.3.) Each program variable must be declared as an instance of a data type. Three basic types are supported in C programming language, characters, integers, and floating-point numbers. Integer and floating-point types have several variations. The primitive data types, sizes (in bytes), and value rages are listed in the following table:
Data type |
Description |
Size |
Range |
char |
character |
1 |
-128 to 127 |
int |
integer |
4 |
-2,147,483,648 to 2,147,483,647 |
short |
short integer |
2 |
-32,768 to 32,767 |
long |
long integer |
4 |
-2,147,483,648 to 2,147,483,647 |
long long |
long integer |
8 |
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
unsigned char |
unsigned character |
1 |
0 to 255 |
unsigned |
unsigned integer |
4 |
0 to 4,294,967,295 |
unsigned short |
unsigned short integer |
2 |
0 to 65,535 |
unsigned long |
unsigned long integer |
4 |
0 to 4,294,967,295 |
unsigned long long |
unsigned long long integer |
8 |
0 to 18,446,744,073,709,551,615 |
float |
single precision floating-point number |
4 |
±1.17549435´10-38 to ±3.40282347´1038 |
double |
double precision floating-point number |
8 |
±2.2250738585072014´10-308 to ±1.7976931348623157´10308 |
The size of a data type depends on the size of a computer's CPU. The sizes in the above table may vary from one computer to another. The following C program size_of.c outputs the size of each data type using a C library function sizeof() which receives a type name and returns the number of bytes of that type.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#include <stdio.h> |
The output of size_of.c is shown as below:
Memory size of C primitive data types
(bytes): |
Character Data Type
In C programs, a character constant is expressed with a pair of single quotation, e.g., 'A', 'a', ';', etc., and is encoded using ASCII encoding scheme. Most of the printable characters can appear as a character by itself. However, a small set of characters must be attached a backslash symbol at the front to form a character in order to avoid ambiguity. A character with a backslash at the front is called an escape sequence. Escape sequences including some control characters of C programming language is listed below:
Escape sequence |
name |
mean |
\a |
Alert |
output an audio or visible alert signal. |
\b |
Backspace |
move the cursor back one position without removing the character. |
\f |
Form feed |
move the cursor to the beginning to the next page. |
\n |
New line |
move the cursor to the beginning of the next line. |
\r |
Carriage return |
move the cursor to the beginning of the current line. |
\t |
horizontal tab |
curve the cursor to the next horizontal tabular position. |
\v |
vertical tab |
curve the cursor to the next vertical tabular position. |
\' |
|
output a single quote. |
\" |
|
output a double quote. |
\? |
|
output a question mark. |
\\ |
output a backslash |
|
\0 |
Null |
output a null character. |
\ddd |
|
define a character with octal digits, where ddd is an octal number between 0 and 12710. |
\xdd |
|
define a character with hexadecimal digits, where dd is a hexadecimal number between 0 and 12710. |
In C programs, a string is declared as an array of multiple characters ending with a null character. (Arrays will be discussed in details in Section 5.2.) Program characters.c manipulates a string and prints out the original and new string. Line 4 declares string as an array of 10 characters and assign the initial values as ABC. Line 6 outputs the original string. Lines 7 through 13, change the contents of string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
#include <stdio.h> |
The original string: ABC |
Octal number 1318 is the ASCII code of character Y and hexadecimal number 5A16 is the ASCII code of character Z. Hence, Lines 11 and 12 are equivalent to:
11 12 |
string[7] = 'Y'; string[8] = 'Z'; |
In C programming language, characters are treated as integers, i.e., it is possible to do arithmetic operation on characters. Hence, Lines 11 and 12 can be written with character arithmetic operations:
11 12 |
string[7] = string[6] + 1; string[8] = string[7] + 1; |
The content of character array string is shown below:
array index |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
original string |
A | B | C | \0 | ||||||
new string |
A | B | C | \t | \ | " | X | Y | Z | \0 |
The ASCII code has the values ranged from 0 to 127. Since a character is of size 8 bit, its values are ranged from -128 to 127. In another words, it is possible to store multi-byte characters, such as Chinese characters, in a character string. However, C programming language does not allow direct assignment of a multi-byte character to a character variable or array. For example the assignment
char c = '中'; |
will cause compiler warnings (in the gray color background) using Dev-C++:
[Warning] multi-character character constant [Warning] overflow in implicit constant conversion |
To read multi-byte character strings, a program can use scanf() or functions in standard library string.h. Program character_big5_encoding.c reads a string and outputs the hexadecimal values of each character code.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#include <stdio.h> |
資訊工程學系 |
From the input string and output codes, it is easy to observe that 資 is encoded to B8EA, 訊 is encoded to B054, 工 is encoded to A475, 程 is encoded to B57B, 學 is encoded to BEC7, and 系 is encoded to A874, using Big5 encoding scheme. This encoding can be verified using program character_big5_decoding.c:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
#include <stdio.h>
|
資訊工程學系 |
Integer Data Type
Integer types in C have many variations. The most commonly used type is int. All signed integer types are stored using two's complement and the size and range are listed in the table of primitive types. Program integer_max_min.c shows the maximum and minimum values of type int.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#include <stdio.h>
|
maximum integer value: 2147483647 |
The next program integer_short_long.c shows that assigning a variable of the normal integer type to a variable of the short integer type (Line 10) may cause the overflow problem. But, it is not a problem of assigning a variable of the normal integer type to a variable of the long integer type (Line 12) is allowed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#include <stdio.h> |
normal integer: 158423 |
Floating-point Data Type
Floating-point numbers in C program can be printed in the fixed-point format or floating-point format. Program float_format.c shows the output of 134.56789 in fixed-point format (Line 7) and floating-point format (Line 8). Also, note that the output values are not exactly 134.5678 since floating-point representation is only an approximation. This fact will make equality test of floating numbers more complicated. We will discuss this issue in Section 4.2.
1 2 3 4 5 6 7 8 9 10 11 |
#include <stdio.h> |
fixed-point format: 134.567886 |
Additional format specification can be added to an integer specifier or a floating-point specifier to make the output more friendly. For example, %4d will output an integer occupying at least four character positions by filling spaces at the left-hand-side, if the length of the integer (including sign symbol) is less than 4; the output will occupy whatever the length of the integer, if this length is greater than or equal to four. Specifier %6.2f will print a floating point number of minimum width six positions such that at least three integer digits, the decimal point, and two fractional digits (two-digit precision). Program specifier.c shows some examples of length and precision format. Lines 5 and 6 produce the first two output lines which have two spaces at the beginning and Line 7 produces the third output line which occupies five character positions. Lines 8 and 9 generates the fourth and fifth output lines which has a space at the beginning and each floating-point number has two fractional digits. The output of Line 10 has total seven characters.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#include <stdio.h> |
12 3232.53 |
In the theory of programming languages, a strongly typed language allows a compiler to check type errors during compile time. However, C programming language is not strictly confined to strongly type checking for it allows assignment of variables of different type to each other. Program primitve_type_not_casting.c tries to print variables of character type, integer type, and floating-point type. The program goes through compilation without any error, but the output shows some of the values have incorrect results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
#include <stdio.h>
return 0; |
The output shows printing a character as an integer (Line 9) or an integer as a character (Line 12) yields a correct answer. However, printing a character or an integer as a floating-point number (Lines 10 and 13) or printing a floating-point number as a character or an integer (Lines 15 and 16) will yield and incorrect answer.
Character as integer: 65 Floating-point as integer: -1610612736 |
To ensure program correctness, proper type casting (鑄型) must be added in front of a variable. Type casting is explicitly to add the name of the intentional type in front of a variable or an expression that it will converted to. For example, in Lines 10 and 12 of program primitve_type_casting.c, character variable c and integer variable i are explicitly casted to type float. In Lines 15 and 16, float-point variable r is casted to type char and int.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
#include <stdio.h> |
The output of program primitve_type_casting.c is shown as below:
Character as integer: 65 |
With proper castings, the output gives correct values. Character A is converted to floating-point number 6.500000E+001 and integer 49 is converted to floating-point number 4.900000E+001. When a floating-point number is casted to char or int, its fractional part is simply removed. Integer 97 is exactly the decimal value of ASCII code of character a. Of course, the floating point numbers can be printed using fixed-point specifier %f.
Previous: 1. An Overview of Computer Hardware and Software Systems