Table of Contents

Previous: 1. An Overview of Computer Hardware and Software Systems

Next: 3. Basic Computer Operations


2  Data Representations in Computers

The contents of computer storage are strings of 1's and 0's. Different types of data, e.g., characters, integers, and real numbers, have different representations in computers. In this chapter, we will give an overview of terminologies, called binary prefixes, that quantify computer storage, length, and time. The numeral system frequently used in a computer is not decimal numerals. The most three numeral systems are binary numerals, octal numerals, and hexadecimal numerals. We will explain theses three numeral systems and data representations of characters, integers, and real numbers. Finally, we describe primitive data types of C programming language.

2.1 Binary Prefix

The smallest unit of data storage in a computer is a bit. A single bit has a binary state which is state of 0 or 1, or a true or false, or two mutually exclusive states. A collection of bits, usually of length 8, is called a byte, which is the basic unit of computer storage. Another term "word" refers to a consecutive bytes with the length of CPU's largest register or largest addressable memory location. For example, for IBM-compatible PC's, a word is equivalent to 16 bits, i.e., 2 bytes, due to early Intel 80x86 processors. However, for many recent CPU's, a word is  equivalent to 32 bits or even 64 bits.

There are binary prefixes to quantify large amount of bits, bytes, and words.

Name

Base 2

Base 10

Numeral

Name

Base 2

Base 10

Numeral

kilo

210

»103

thousand

peta

250

»1015

quardrillion

mega

220

»106

million

exa

260

»1018

quintillion

giga

230

»109

billion

zeta

270

»1021

sextillion

tera

240

»1012

trillion

yotta

280

»1024

septillion

The first three names are mostly used in computer science. The term kilo-bytes means 210 bytes, i.e., 1,024 bytes, about 1,000 bytes; the term mega-bytes means 220 bytes, i.e., 1,048,576 bytes, about 1,000,000 bytes; the term giga-bytes means 230 bytes, i.e., 1,073,741,824 bytes, about 1,000,000,000 bytes. The three terms are usually written in short as KB, MB, and GB, respectively. Hence, a 1.8 KB file means that the file is about of the size 1,800 bytes; a 128 MB RAM means that the memory is about of the size 128,000,000 bytes, a 40 GB hard disk means that the disk is about of the size 40,000,000,000 bytes.

There are also terminologies for quantifying small seconds of time, length, and weight. These terms are expressed in base 10.

Name

Base 10

Numeral

Name

Base 10

Numeral

milli

10-3

thousandth

nano

10-9

billionth

micro

10-6

millionth

pico

10-12

trillionth

One millisecond, 1 ms, is one thousandth of a second, i.e., 10-3 seconds. One microsecond, 1 ms, is on millionth of a second, i.e., 10-6 seconds. One nanosecond, 1 ns, is one billionth of a second, i.e., 10-9 seconds.

Several terminologies are frequently used in computer science. We give some examples below:

Description

Notation

Meaning

network bandwidth

128 kbps

128 kilo bits per second

frequency of CPU clock

600 MHz

600 mega hertz = 600´106 clock cycles per second;
1/(600´106) seconds per clock cycle = 1.67 nanoseconds per clock cycle

microprocessor speed (in terms of instruction)

600 MIPS

600 mega (million) instructions per second

microprocessor speed (in terms of operation)

1.4 GFlops

1.4 giga (billion) floating point operations per second

2.2  Binary Numerals, Octal Numerals, and Hexadecimal Numerals

The numeral system we use daily is decimal numerals, i.e., base 10 numbers. In the decimal number system, there are ten symbols, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, with decimal points and plus (+) and minus (-) signs to express decimal numbers.

Three other numeral systems are used in computer science, binary numerals (base 2), octal numerals (base 8), and hexadecimal numerals (base 16), For binary numerals, there are only two digit symbols 0 and 1. Octal numerals are composed of 8 digit symbols, 0, 1, 2, 3, 4, 5, 6, and 7. Hexadecimal numerals are composed of 16 digit symbols, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. The English digits, A, B, C, D, E, and F, of hexadecimal numerals, denote decimal values 10, 11, 12, 13, 14, and 15, respectively. The following table shows the equivalent values of decimal number, binary numbers, octal numbers, and hexadecimal number from 010 to 1510.

Dec.

Bin.

Oct.

Hex.

Dec.

Bin.

Oct.

Hex.

0

0000

00

0

8

1000

10

8

1

0001

01

1

9

1001

11

9

2

0010

02

2

10

1010

12

A

3

0011

03

3

11

1011

13

B

4

0100

04

4

12

1100

14

C

5

0101

05

5

13

1101

15

D

6

0110

06

6

14

1110

16

E

7

0111

07

7

15

1111

17

F

A decimal number, say 25084, is actually an expression of powers of 10, i.e.,

25084=2´104+5´103+0´102+8´101+4´100.

Similarly, a binary number, say 1011012 is actually an expression of powers of 2, i.e.,

1011012=1´25+0´24+1´23+1´22+0´21+1´20=45.

Note that when there is a chance of ambiguity, a subscript is used to indicate the base of a number. Often, terms with coefficient 0 are omitted and 1011012 is written as:

1011012=1´25+1´23+1´22+1´20.

An octal number, say 7638, is an expression of powers of 8, i.e.,

7638=7´82+6´81+3´80=499.

A hexadecimal number, say 9E4B16, is an expression of powers of 16, i.e.,

9E4B16=9´163+14´162+4´161+11´160=40523.

For a number with a fixed length of digits, leading zeros will be added if the length of this number is shorter than the given fixed length. For example, if we designate a binary number of 8 bits, then 1011012 is often written as 001011012. We show examples of converting a number from a numeral system to the other numeral systems.

Example 1: Convert decimal number 40206 to a binary number, an octal number, and a hexadecimal number.

The stepwise conversion to a binary number is illustrated as below:

Input: decimal number d = 40206.

Output: binary number b of the same value as d.

  1. Let b be a "NULL" binary value.

  2. Divided d by 2. Attach the remainder on the left-hand-side of b and set d to be the quotient.

  3. If d ³ 2, repeat Step 2; otherwise, attach d on the left-hand-side of b.

Operational steps:

Initial: (d, b) = (40206, )

40206 ¸ 2 = 20103 ×××××× 0 (20103, 0)
20103 ¸ 2 = 10051 ×××××× 1 (10051, 10)
10051 ¸ 2 = 5025 ×××××× 1 (5025, 110)
5025 ¸ 2 = 2512 ×××××× 1 (2512, 1110)
2512 ¸ 2 = 1256 ×××××× 0 (1256, 01110)
1256 ¸ 2 = 628 ×××××× 0 (628, 001110)
628 ¸ 2 = 314 ×××××× 0 (314, 0001110)
314 ¸ 2 = 157 ×××××× 0 (157, 00001110)
157 ¸ 2 = 78 ×××××× 1 (78, 100001110)
78 ¸ 2 = 39 ×××××× 0 (39, 0100001110)
39 ¸ 2 = 19 ×××××× 1 (19, 10100001110)
19 ¸ 2 = 9 ×××××× 1 (9, 110100001110)
9 ¸ 2 = 4 ×××××× 1 (4, 1110100001110)
4 ¸ 2 = 2 ×××××× 0 (2, 01110100001110)
2 ¸ 2 = 1 ×××××× 0 (1, 001110100001110)

\  4020610=1001 1101 0000 11102

The stepwise conversion to an octal number is illustrated as below:

Input: decimal number d = 40206.

Output: octal number o of the same value as d.

  1. Let o be a "NULL" octal value.

  2. Divided d by 8. Attach the remainder on the left-hand-side of o and set d to be the quotient.

  3. If d ³ 8, repeat Step 2; otherwise, attach d on the left-hand-side of o.

Operational steps:

Initial: (d, o) = (40206, )

40206 ¸ 8 = 5025 ×××××× 6

(5025, 6)

5025 ¸ 8 = 628 ×××××× 1

(628, 16)

628 ¸ 8 = 78 ×××××× 4

(78, 416)

78 ¸ 8 = 9 ×××××× 6

(9, 6416)

9 ¸ 8 = 1 ×××××× 1

(1, 16416)

\  4020610=116 4168

The stepwise conversion to a hexadecimal number is illustrated as below:

Input: decimal number d = 40206.

Output: hexadecimal number h of the same value as d.

  1. Let h be a "NULL" binary value.

  2. Divided d by 16. Attach the remainder on the left-hand-side of h and set d to be the quotient.

  3. If d ³ 16, repeat Step 2; otherwise, attach d on the left-hand-side of h.

Operational steps:

Initial: (d, h) = (40206, )

40206 ¸ 16 = 2512 ×××××× 14

(2512, E)

2512 ¸ 16 = 157 ×××××× 0

(157, 0E)

157 ¸ 16 = 9 ×××××× 13

(9, D0E)

\  4020610=9D0E16

If the binary number 1001110100001110 is grouped into 3 bits each from left to right, 1 001 110 100 001 110, it is easy to convert each group of  three bits to an octal digit and obtain 1164168. Similarly, if the binary number 1001110100001110 is grouped into 4 bits each from left to right, 1001 1101 0000 1110, it is easy to convert each group of  four bits to a hexadecimal digit and obtain 9D0E16. Conversely, it is possible to convert a decimal into an octal or a hexadecimal number and then convert each octal digit or hexadecimal digit into binary bits. This procedure requires fewer steps to convert a large decimal number to a binary one.

Similarly to decimal arithmetic operations, we can add and multiply a pair of binary numbers, octal numbers, and hexadecimal numbers. Addition of two binary numbers, 10010010+00111011, is shown as the following:

carry

    1 1     1      
    1 0 0 1 0 0 1 0  
  + 0 0 1 1 1 0 1 1  
    1 1 0 0 1 1 0 1  

Addition of two octal numbers, 2614+1356, is similar:

carry

  1   1    
    2 6 1 4  
  + 1 3 5 6  
    4 1 7 2  

Let us examine the addition operation  26148+13568 = 142010 + 75010 = 217010 = 41728. We leave addition of two hexadecimal numbers as exercise.

Multiplication of two binary numbers, 10110´1010, is shown as the following:

 

        1 0 1 1 0  
  ´         1 0 1 0  
          0 0 0 0 0  
        1 0 1 1 0    
      0 0 0 0 0      
    1 0 1 1 0        
    1 1 0 1 1 1 0 0  

Multiplication of octal numbers, we first consider products of two single octal digits, e.g., 58´78=438, 68´78=528, and 38´78=258, and products of multiple digits and a single digit, e.g., 5638´78=43008+5208+258=50458. Multiplication of two octal numbers 563´247 is illustrated as below:

 

        5 6 3  
  ´       2 4 7  
        5 0 4 5  
      2 7 1 4    
    1 3 4 6      
    1 7 1 0 0 5  

Binary numbers, octal numbers, and hexadecimal numbers with decimal points are similar to that of decimal numbers. For example, binary fractional number 11.101 is converted to a decimal fractional number as below:

11.1012=1´21+1´20+1´2-1+0´2-2+1´2-3=2+1+0.5+0.125=3.62510.

The conversion of a decimal fractional number to a binary number is done in two steps. First, convert the integer part, and then convert the fractional part. For example, the conversion of 3.62510 to a binary fractional number is done by first converting the integer part 310 to to 112 as before. The second step is to convert the fractional part 0.62510 to a binary fraction. The fraction conversion is described and illustrated below.

Input: decimal fraction d = 0.625.

Output: binary fraction b of the same value as d.

  1. Let b be 0. binary value.

  2. Multiply d by 2. Attach the digit in the integer part (either 0 or 1) to the right-hand-side of b and set d to be fractional part of the product.

  3. If d ¹ 0, repeat Step 2; otherwise, stop.

Operational steps:

Initial: (d, b) = (0.625, 0.)

0.625 ´ 2  = 1.25

(0.25, 0.1)

0.25 ´ 2  = 0.5

(0.5, 0.10)

0.5 ´ 2  = 1.0

(0.0, 0.101)

\  0.62510=0.1012

Therefore, decimal number 3.62510 is converted to 11.1012. However, conversion of decimal fractions to binary fractions is not always perfect. Let us try to convert 0.810 to a binary fraction.

Operational steps:

Initial: (d, b) = (0.8, 0.)

0.8 ´ 2  = 1.6

(0.6, 0.1)

0.6 ´ 2  = 1.2

(0.2, 0.11)

0.2 ´ 2  = 0.4

(0.4, 0.110)

0.4 ´ 2  = 0.8

(0.8, 0.1100)

0.8 ´ 2  = 1.6

(0.6, 0.11001)

0.6 ´ 2  = 1.2

(0.2, 0.110011)

0.2 ´ 2  = 0.4

(0.4, 0.1100110)

0.4 ´ 2  = 0.8

(0.8, 0.11001100)

¼ ¼
0.4 ´ 2  = 0.8 (0.8, 0.110011001100¼1100)

\  0.810=0.110011001100¼2 (a cyclic binary fraction)

Practically, a decimal fraction is converted to a binary fraction of its approximation with a specific length. For example, if the binary fraction is fixed to 8 bits, 0.810 is converted to 0.110011002 which is actually 0.79687510, an approximation of 0.810.

Binary numbers with the negative sign are expressed in a specific number of bits, too. We will consider 8-bit signed binary numbers. A positive binary number, say, 3710, is expressed as 001001012. There are two methods for representing negative binary numbers. The first representation is one's complement which is the bitwise NOT of a binary number, e.g., -3710 = 001001012 = 110110102. For 8-bit signed numbers using one's complement, only integers ranges from -12710 (100000002) to 12710 (011111112) can be expressed. For example, 32010 cannot be expressed as an 8-bit signed binary number, since 32010 = 1 0100 00002 which requires at least 9 bits. In one's complement, the most significant bit is used as the sign bit. If the most significant bit is 0, the binary number is a positive number; if the most significant bit is 1, the binary number is a negative number. One's complement has a drawback of two zero's, 000000002 and 111111112.

The second representation of signed binary numbers is two's complement. For an 8-bit binary number, the most significant bit is used as the sign bit. The representation of a negative binary number using two's complement is taking the one's complement of its absolute value then adding one to it. For example, -3710 = 001001012 + 12 = 110110102 + 12 = 110110112. For 8-bit signed numbers using two's complement ranges from -12810 (100000002) to 12710 (011111112). Note that there is no 12810 of 8-bit two's complement binary numbers.

Two's complement representation is suitable for binary number arithmetic. Consider 8-bit signed binary numbers. Addition of two positive numbers is the same as the previous case 2710 + 1210 = 000110112 + 000011002 = 001001112. Additions of negative numbers are illustrated as below:

carry

1 1 1 1            
    0 0 0 1 1 0 1 1

(2710)

  + 1 1 1 1 0 1 0 0

(-1210=000011002+12=111100112+12=111101002)

    0 0 0 0 1 1 1 1

(1510)

 

carry

        1 1        
    1 1 1 0 0 1 0 1

(-2710=000110112+12=111001002+12=111001012)

  + 0 0 0 0 1 1 0 0

(1210)

    1 1 1 1 0 0 0 1

(-1510=000011112+12=111100002+12=111100012)

 

carry

1 1 1     1        
    1 1 1 0 0 1 0 1

(-2710)

  + 1 1 1 1 0 1 0 0

(-1210)

    1 1 0 1 1 0 0 1

(-3910=001001112+12=110110002+12=110110012)

Subtraction of two positive numbers is first converting the second operand using two's complement than adding the two numbers. For example,

2710-1210=2710+(-1210)=000110112+111101002=000011112=1510.

However, the following addition results in an error of overflow. It is considered as an invalid operation.

carry

  1         1      
    0 1 0 0 1 0 1 1

(7510)

  + 0 1 0 1 0 0 1 0

(8210)

    1 0 0 1 1 1 0 1

(Overflow! 100111012 is a negative number -9910.)

Multiplication of negative numbers is to convert negative operand(s) to its/their absolute values, to calculate the product of two positive numbers, and then to convert the product back to its two's complement representation if only one of the operands is negative.

2.3  Characters

Characters are represented in a computer as a bit string. The most popular encoding scheme of English characters is ASCII encoding (American Standard Code for Information Interchange). Each character of ASCII code occupies one byte, but only seven bits are used. The most significant bit of ASCII code is always 0. The character set ASCII code is divided into two categories: control characters (in green color) and printable characters (in blue color). The printable characters range from decimal values 32 to 126. The table of ASCII code is shown as below.

Dec.

Oct.

Hex.

Bin.

Character

Description

0

000

00

00000000

NUL

Null character

1

001

01

00000001

SOH

Start of header

2

002

02

00000010

STX

Start of text

3

003

03

00000011

ETX

End of text

4

004

04

00000100

EOT

End of transmission

5

005

05

00000101

ENQ

Enquiry

6

006

06

00000110

ACK

Acknowledgement

7

007

07

00000111

BEL

Bell

8

010

08

00001000

BS

Backspace

9

011

09

00001001

HT

Horizontal tab

10

012

0A

00001010

LF

Line feed

11

013

0B

00001011

VT

Vertical tab

12

014

0C

00001100

FF

Form feed

13

015

0D

00001101

CR

Carriage return

14

016

0E

00001110

SO

Shift out

15

017

0F

00001111

SI

Shift In

16

020

10

00010000

DLE

Data link escape

17

021

11

00010001

DC1

Device control 1 (XON)

18

022

12

00010010

DC2

Device control 2

19

023

13

00010011

DC3

Device control 3

20

024

14

00010100

DC4

Device control 4 (XOFF)

21

025

15

00010101

NAK

Negative acknowledgement

22

026

16

00010110

SYN

Synchronous idle

23

027

17

00010111

ETB

End of transfer block

24

030

18

00011000

CAN

Cancel

25

031

19

00011001

EM

End of medium

26

032

1A

00011010

SUB

Substitute

27

033

1B

00011011

ESC

Escape

28

034

1C

00011100

FS

File separator

29

035

1D

00011101

GS

Group separator

30

036

1E

00011110

RS

Return to send; record separator

31

037

1F

00011111

US

Unit separator

32

040

20

00100000

SP

Space

33

041

21

00100001

!

 

34

042

22

00100010

"

 

35

043

23

00100011

#

 

36

044

24

00100100

$

 

37

045

25

00100101

%

 

38

046

26

00100110

&

 

39

047

27

00100111

'

 

40

050

28

00101000

(

 

41

051

29

00101001

)

 

42

052

2A

00101010

*

 

43

053

2B

00101011

+

 

44

054

2C

00101100

,

 

45

055

2D

00101101

-

 

46

056

2E

00101110

.

 

47

057

2F

00101111

/

 

48

060

30

00110000

0

 

49

061

31

00110001

1

 

50

062

32

00110010

2

 

51

063

33

00110011

3

 

52

064

34

00110100

4

 

53

065

35

00110101

5

 

54

066

36

00110110

6

 

55

067

37

00110111

7

 

56

070

38

00111000

8

 

57

071

39

00111001

9

 

58

072

3A

00111010

:

 

59

073

3B

00111011

;

 

60

074

3C

00111100

<

 

61

075

3D

00111101

=

 

62

076

3E

00111110

>

 

63

077

3F

00111111

?

 

64

100

40

01000000

@

 

65

101

41

01000001

A

 

66

102

42

01000010

B

 

67

103

43

01000011

C

 

68

104

44

01000100

D

 

69

105

45

01000101

E

 

70

106

46

01000110

F

 

71

107

47

01000111

G

 

72

110

48

01001000

H

 

73

111

49

01001001

I

 

74

112

4A

01001010

J

 

75

113

4B

01001011

K

 

76

114

4C

01001100

L

 

77

115

4D

01001101

M

 

78

116

4E

01001110

N

 

79

117

4F

01001111

O

 

80

120

50

01010000

P

 

81

121

51

01010001

Q

 

82

122

52

01010010

R

 

83

123

53

01010011

S

 

84

124

54

01010100

T

 

85

125

55

01010101

U

 

86

126

56

01010110

V

 

87

127

57

01010111

W

 

88

130

58

01011000

X

 

89

131

59

01011001

Y

 

90

132

5A

01011010

Z

 

91

133

5B

01011011

[

 

92

134

5C

01011100

\

 

93

135

5D

01011101

]

 

94

136

5E

01011110

^

 

95

137

5F

01011111

_

 

96

140

60

01100000

`

 

97

141

61

01100001

a

 

98

142

62

01100010

b

 

99

123

63

01100011

c

 

100

144

64

01100100

d

 

101

145

65

01100101

e

 

102

146

66

01100110

f

 

103

147

67

01100111

g

 

104

150

68

01101000

h

 

105

151

69

01101001

i

 

106

152

6A

01101010

j

 

107

153

6B

01101011

k

 

108

154

6C

01101100

l

 

109

155

6D

01101101

m

 

110

156

6E

01101110

n

 

111

157

6F

01101111

o

 

112

160

70

01110000

p

 

113

161

71

01110001

q

 

114

162

72

01110010

r

 

115

163

73

01110011

s

 

116

164

74

01110100

t

 

117

165

75

01110101

u

 

118

166

76

01110110

v

 

119

167

77

01110111

w

 

120

170

78

01111000

x

 

121

171

79

01111001

y

 

122

172

7A

01111010

z

 

123

173

7B

01111011

{

 

124

174

7C

01111100

|

 

125

175

7D

01111101

}

 

126

176

7E

01111110

~

 

127

177

7F

01111111

DEL

Delete

A computer does not process only English text that it must be able to deal with text in other languages such as Chinese, Japanese, and Spanish. To accommodate other languages, various character encoding schemes have been developed. For examples, two Chinese character encoding schemes are used in different area,  Big5 for original Chinese characters, GB for simplified Chinese characters.  Big5 and GB are all 16-bit encoding schemes.

Unicode is an international standard of character encoding of all human written languages developed by International Organization for Standardization (ISO). Unicode has been adopted by many computer systems to support multilingual environments, including Microsoft windows such as Windows NT, Windows 2000, Windows XP, Unix based operating system such as GNU/Linux, BSD, Mac OS X. Unicode is designed as both 8-bit, UTF-8, and 16-bit, UTF-16, coding schemes for different languages.

2.4  Integers

In mathematics, the set of integers have infinite number of elements. However, a computer has only finite resource, a given size of memory storage,  that it is not possible to represent all integers in a computer. There are two basic types of integer representations: unsigned and signed integers.

Unsigned integers are non-negative integers and are represented in their binary values with a fixed number of bits. For Intel 80x86 processors, a word is 16 bit long and an integer is represented as a 16-bit binary numbers. However, the 16-bit string is too long to read. For easy reading, a 16-bit binary number is usually written as a four-digit hexadecimal number. For example, 5876 is represented as binary number 0001 0110 1111 0100 and is written as hexadecimal number 16F4. Another example, 39741 is represented as binary number 1001 1011 0011 1101 and hexadecimal number 9B3D. Note that the most significant bit of 39741 is one, but it is a positive integer. For 16-bit unsigned integers, the values range from 0 to 216-1 (6553510).

Signed integers are zero, negative, and positive integers and are represented using two's complement. For a 16-bit processor, the most significant bit is used as the signed bit. The values a signed integer range from -215 (-3276810) to 215-1 (3276710). Usually, the representation of a signed integer is expressed in hexadecimal numbers for easy reading. Positive integer 587610 is still represented as  16F416. However, 3974110 is too large to be represented as a 16-bit signed integer; its corresponding hexadecimal number 9B3D16 is indeed the two's complement of signed negative integer -2579510.

2.5  Real Numbers

Real numbers are represented in a computer using two method: fixed-point format and floating-point format. In a computer, it is not possible to represent a real number precisely. For example, 1/3 is 0.3333.... that cannot be exactly stored in a finite length of memory storage. Both fixed-point numbers and floating-point numbers are only approximation of real numbers.

The representation of fixed-point numbers uses a specific number of bits that part of the bits are used to store the integer part and the rest of the bits are used to store the fractional part after the decimal point. Consider 16-bit representation of fixed-point numbers that the higher eight bits are the integer part and the lower eight bits are the fractional part. For example, fixed-point number 0100 0011 1001 0100 is the representation of

01000011.100101002 = 26 +  21 +  20 +  2-1 +  2-4 +  2-6

= 64 + 2 + 1 + 0.5 +  0.0625 + 0.015625 = 67.57812510

For signed fixed-point numbers, a bit is used to represent the plus and minus sign. In the case of 16-bit processors, only 15 bits are left to store the integer and fractional parts, e.g., 7 bits for the integer part and 8 bits for the fractional part.. To express a decimal numbers in the fixed-point format, the decimal real number is converted to a binary number with decimal points with the specified bit length. The decimal real number -18.62510 is converted to -10010.1012 and its fixed-point representation is 1 0010010 10100000.

A floating-point number contains three parts: a sign, an exponent, and a fraction. Floating point numbers are written in the scientific notation: ±fractionEexponent. For examples, 1.432456E5 means 1.432456´105=143245.6 and 3.214065E-3 means 3.214065´10-3=0.003214065.

In a computer, floating point number representation uses scientific notation of the binary number system. A popular standard of floating-point number representation is IEEE 754. The 32-bit single precision floating-point format of IEEE 754 is shown as below:

31 30             23 22                                           0
s exponent fraction
s e7 e6 e5 e4 e3 e2 e1 e0 f22 f21 f20 f19 f18 f17 f16 f15 f14 f13 f12 f11 f10 f9 f8 f7 f6 f5 f4 f3 f2 f1 f0

where the most significant bit (bit 31) denotes the sign bit, bits 23 to 31 are exponent bits, and bits 0 to 22 are fraction bits. These three parts have the following meanings:

  1. The sign bit is either 0 or 1 that denotes positive or negative floating point numbers, respectively.

  2. The 8-bit exponent (bits 23 to 31) has values ranged from 1 to 254 (0 and 255 are not used) with bias 127, i.e., the actual exponent value is the value of the 8-bit exponent e7e6e5e4e3e2e1e0 subtracted by 127.

  3. The 23-bit fraction (bits 0 to 22) is the value of a binary fraction, i.e., the fractional value has the integer part of value 1 and the fractional part is f22f21f20f19f18f17f16f15f14f13f12f11f10f9f8f7f6f5f4f3f2f1f0.

Let sing(s) be the plus sign (+), if the sign bit is 0, and be the minus sign (-), if the sign bit is 1. Let e be the binary number e7e6e5e4e3e2e1e0. Also, let f be the binary fractional number 0.f22f21f20f19f18f17f16f15f14f13f12f11f10f9f8f7f6f5f4f3f2f1f0. The floating point number of 32-bit IEEE 754 standard is calculated as

sign(s) (1 + f) ´ 2e - 127.

The value of 0 01111011 01100000000000000000000 is

1.0112 ´ 2123-127 = 1.0112  ´ 2-4 = 0.00010112 = 0.085937510,

where the exponent 011110112 equals to 12310. The value of 1 10000101 10110000000000000000000 is

-1.10112 ´ 2133-127 = -1.10112 ´ 26 = -11011002 = -10810,

where the exponent 100001012 equals to 13310.

To express a decimal numbers in the floating-point format, the decimal number is converted to a binary number with decimal points. Then calculate the exponent part and the fraction part. Consider the decimal number -18.62510.

 -18.62510 = -10010.1012 = -1.00101012 ´ 24.

We obtain the results that the sign bit is 1, the exponents is 4 +127 = 13110 = 100000112, and the fraction is 0010101. The floating-point representation of -18.625 is written as 1 10000011 00101010000000000000000. If both the exponent part and the fraction part are 0, the floating point number is 0.

2.6  Primitive Data Types in C

A C program contains a number of variables each of which represents a memory location. (Issues on program variables will be discussed in Section 3.3.) Each  program variable must be declared as an instance of a data type. Three basic types are supported in C programming language, characters, integers, and floating-point numbers. Integer and  floating-point types have several variations. The primitive data types, sizes (in bytes), and value rages are listed in the following table:

Data type

Description

Size

Range

char

character

1

-128 to 127

int

integer

4

-2,147,483,648 to 2,147,483,647

short

short integer

2

-32,768 to 32,767

long

long integer

4

-2,147,483,648 to 2,147,483,647

long long

long integer

8

-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

unsigned char

unsigned character

1

0 to 255

unsigned

unsigned integer

4

0 to 4,294,967,295

unsigned short

unsigned short integer

2

0 to 65,535

unsigned long

unsigned long integer

4

0 to 4,294,967,295

unsigned long long

unsigned long long integer

8

0 to 18,446,744,073,709,551,615

float

single precision floating-point number

4

±1.17549435´10-38 to ±3.40282347´1038

double

double precision floating-point number

8

±2.2250738585072014´10-308 to ±1.7976931348623157´10308

The size of a data type depends on the size of a computer's CPU. The sizes in the above table may vary from one computer to another. The following C program size_of.c outputs the size of each data type using a C library function sizeof() which receives a type name and returns the number of bytes of that type.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

#include <stdio.h>

int main(void) {

  printf("Memory size of C primitive data types (bytes): \n");
  printf("  char: %d\n", sizeof(char));
  printf("  int: %d\n", sizeof(int));
  printf("  short: %d\n", sizeof(short));
  printf("  long: %d\n", sizeof(long));
  printf("  long long: %d\n", sizeof(long long));
  printf("  unsigned: %d\n", sizeof(unsigned));
  printf("  unsigned short: %d\n", sizeof(unsigned short));
  printf("  unsigned long: %d\n", sizeof(unsigned long));
  printf("  unsigned long long: %d\n", sizeof(unsigned long long));
  printf("  float: %d\n", sizeof(float));
  printf("  double: %d\n", sizeof(double));

  return 0;
}

The output of size_of.c is shown as below:

Memory size of C primitive data types (bytes):
  char: 1
  int: 4
  short: 2
  long: 4
  long long: 8
  unsigned: 4
  unsigned short: 2
  unsigned long: 4
  unsigned long long: 8
  float: 4
  double: 8

Character Data Type

In C programs, a character constant is expressed with a pair of single quotation, e.g., 'A', 'a', ';', etc., and is encoded using ASCII encoding scheme. Most of the printable characters can appear as a character by itself. However, a small set of characters must be attached a backslash symbol at the front to form a character in order to avoid ambiguity. A character with a backslash at the front is called an escape sequence. Escape sequences including some control characters of C programming language is listed below:

Escape sequence

name

mean

\a

Alert

output an audio or visible alert signal.

\b

Backspace

move the cursor back one position without removing the character.

\f

Form feed

move the cursor to the beginning to the next page.

\n

New line

move the cursor to the beginning of the next line.

\r

Carriage return

move the cursor to the beginning of the current line.

\t

horizontal tab

curve the cursor to the next horizontal tabular position.

\v

vertical tab

curve the cursor to the next vertical tabular position.

\'

 

output a single quote.

\"

 

output a double quote.

\?

 

output a question mark.

\\

 

output a backslash

\0

Null

output a null character.

\ddd

 

define a character with octal digits, where ddd  is an octal number between 0 and 12710.

\xdd

 

define a character with hexadecimal digits, where dd  is a hexadecimal number between 0 and 12710.

In C programs, a string is declared as an array of multiple characters ending with a null character. (Arrays will be discussed in details in Section 5.2.) Program characters.c manipulates a string and prints out the original and new string. Line 4 declares string as an array of 10 characters and assign the initial values as ABC. Line 6 outputs the original string. Lines 7 through 13, change the contents of string.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

#include <stdio.h>

int main(void) {
  char string[10] = {'A', 'B', 'C', '\0'};

  printf("The original string: %s\n", string);
  string[3] = '\t';   // horizontal tab
  string[4] = '\\';   // backslash
  string[5] = '\"';   // double quote
  string[6] = 'X';    // character X
  string[7] = '\131'; // character Y
  string[8] = '\x5A'; // character Z
  string[9] = '\0';   // null
  printf("The new string: %s\n", string);

  return 0;
}

The original string: ABC
The new string: ABC   \"XYZ

Octal number 1318 is the ASCII code of character Y and hexadecimal number 5A16 is the ASCII code of character Z. Hence,  Lines 11 and 12 are equivalent to:

11

12

string[7] = 'Y';

string[8] = 'Z';

In C programming language, characters are treated as integers, i.e., it is possible to do arithmetic operation on characters. Hence, Lines 11 and 12 can be written with character arithmetic operations:

11

12

string[7] = string[6] + 1;

string[8] = string[7] + 1;

The content of character array string is shown below:

array index

0 1 2 3 4 5 6 7 8 9

original string

A B C \0            

new string

A B C \t \ " X Y Z \0

The ASCII code has the values ranged from 0 to 127. Since a character is of size 8 bit, its values are ranged from -128 to 127. In another words, it is possible to store multi-byte characters, such as Chinese characters, in a character string. However, C programming language does not allow direct assignment of a multi-byte character to a character variable or array. For example the assignment

char c = '中';

will cause compiler warnings (in the gray color background) using Dev-C++:

[Warning] multi-character character constant

[Warning] overflow in implicit constant conversion

To read multi-byte character strings, a program can use scanf() or functions in standard library string.h. Program character_big5_encoding.c reads a string and outputs the hexadecimal values of each character code.

1

2

3

4

5

6

7

8

9

10

11

12

13

#include <stdio.h>

int main(void) {
  unsigned char string[13];
  int i;

  scanf("%s", &string);
  for (i=0; string[i] != '\0'; i++)
    printf("%X ", string[i]);
  printf("\n");

  return 0;
}

資訊工程學系
B8 EA B0 54 A4 75 B5 7B BE C7 A8 74

From the input string and output codes, it is easy to observe that is encoded to B8EA, is encoded to B054, is encoded to A475, is encoded to B57B, is encoded to BEC7, and is encoded to A874, using Big5 encoding scheme. This encoding can be verified using program character_big5_decoding.c:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

#include <stdio.h>

int main(void) {
  unsigned char string[13];

  string[0] = '\xB8';
  string[1] = '\xEA';
  string[2] = '\xB0';
  string[3] = '\x54';
  string[4] = '\xA4';
  string[5] = '\x75';
  string[6] = '\xB5';
  string[7] = '\x7B';
  string[8] = '\xBE';
  string[9] = '\xC7';
  string[10] = '\xA8';
  string[11] = '\x74';
  string[12] = '\0';
  printf("%s\n", string);


  return 0;
}

資訊工程學系

Integer Data Type

Integer types in C have many variations. The most commonly used type is int. All signed integer types are stored using two's complement and the size and range are listed in the table of primitive types. Program integer_max_min.c shows the maximum and minimum values of type int.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

#include <stdio.h>

int main(void) {
  int i;

  i = 2147483647;
  printf("maximum integer value: %d\n", i);
  i = 2147483648;
  printf("integer overflow: %d\n", i);
  i = -2147483648;
  printf("minimum integer value: %d\n", i);
  i = -2147483649;
  printf("integer underflow: %d\n", i);


  return 0;
}

maximum integer value: 2147483647
integer overflow: -2147483648
minimum integer value: -2147483648
integer underflow: 2147483647

The next program integer_short_long.c shows that assigning a variable of the normal integer type to a variable of the short integer type (Line 10) may cause the overflow problem. But, it is not a problem of assigning a variable of the normal integer type to a variable of the long integer type (Line 12) is allowed.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

#include <stdio.h>

int main(void) {
  int i;
  short j;
  long k;

  i = 158423;
  printf("normal integer: %d\n", i);
  j = i;
  printf("short integer (overflow): %d\n", j);
  k = i;
  printf("long integer: %d\n", k);

  return 0;
}

normal integer: 158423
short integer (overflow): 27351
long integer: 158423

Floating-point Data Type

Floating-point numbers in C program can be printed in the fixed-point format or floating-point format. Program float_format.c shows the output of 134.56789 in fixed-point format (Line 7) and floating-point format (Line 8). Also, note that the output values are not exactly 134.5678 since floating-point representation is only an approximation. This fact will make equality test of floating numbers more complicated. We will discuss this issue in Section 4.2.

1

2

3

4

5

6

7

8

9

10

11

#include <stdio.h>

int main(void) {
float r;

r = 134.56789;
printf("fixed-point format: %f\n", r);
printf("floating-point format: %E\n", r);

return 0;
}

fixed-point format: 134.567886
floating-point format: 1.345679E+002

Additional format specification can be added to an integer specifier or a floating-point specifier to make the output more friendly. For example, %4d will output an integer occupying at least four character positions by filling spaces at the left-hand-side, if the length of the integer (including sign symbol) is less than 4; the output will occupy whatever the length of the integer, if this length is greater than or equal to four. Specifier %6.2f will print a floating point number of minimum width six positions such that at least three integer digits, the decimal point, and two fractional digits (two-digit precision). Program specifier.c shows some examples of length and precision format. Lines 5 and 6 produce the first two output lines which have two spaces at the beginning and Line 7 produces the third output line which occupies five character positions. Lines 8 and 9 generates the fourth and fifth output lines which has a space at the beginning and each floating-point number has two fractional digits. The output of Line 10 has total seven characters.

1

2

3

4

5

6

7

8

9

10

11

12

13

#include <stdio.h>

int main(void) {

  printf("%4d\n", 12);
  printf("%4d\n", -8);
  printf("%4d\n", 12345);
  printf("%6.2f\n", -2.5);
  printf("%6.2f\n", 32.533);
  printf("%6.2f\n", 3232.533);

  return 0;
}

  12
  -8
12345
 -2.50
 32.53

3232.53

In the theory of programming languages, a strongly typed language allows a compiler to check type errors during compile time. However, C programming language is not strictly confined to strongly type checking for it allows assignment of variables of different type to each other. Program primitve_type_not_casting.c tries to print variables of character type, integer type, and floating-point type. The program goes through compilation without any error, but the output shows some of the values have incorrect results.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

#include <stdio.h>

int main(void) {
  char c;
  int i;
  float r;

  c = 'A';
  printf("Character as integer: %d\n", c);
  printf("Character as floating-point: %E\n", c);
  i = 49;
  printf("Integer as character: %c, %X\n", i, i);
  printf("Integer as floating-point: %E\n", i);
  r = 97.512;
  printf("Floating-point as character: %c, %X\n", r, r);
  printf("Floating-point as integer: %d\n", r);

 

  return 0;
}

The output shows printing a character as an integer (Line 9) or an integer as a character (Line 12) yields a correct answer. However, printing a character or an integer as a floating-point number (Lines 10 and 13) or printing a floating-point number as a character or an integer (Lines 15 and 16) will yield and incorrect answer.

Character as integer: 65
Character as floating-point: 5.284007E-308
Integer as character: 1, 31
Integer as floating-point: 1.039778E-312
Floating-point as character: , 405860C4

Floating-point as integer:  -1610612736

To ensure program correctness, proper type casting (鑄型) must be added in front of a variable. Type casting is explicitly to add the name of the intentional type in front of a variable or an expression that it will converted to. For example, in Lines 10 and 12 of program primitve_type_casting.c, character variable c and integer variable i are explicitly casted to type float. In Lines 15 and 16, float-point variable r is casted to type char and int.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

#include <stdio.h>

int main(void) {
  char c;
  int i;
  float r;

  c = 'A';
  printf("Character as integer: %d\n", c);
  printf("Character as floating-point: %E\n", (float) c);
  i = 49;
  printf("Integer as character: %c, %X\n", i, i);
  printf("Integer as floating-point: %E\n", (float) i);
  r = 97.512;
  printf("Floating-point as character: %c, %X\n", (char) r, (char) r);
  printf("Floating-point as integer: %d\n", (int) r);

  return 0;
}

The output of program primitve_type_casting.c is shown as below:

Character as integer: 65
Character as floating-point: 6.500000E+001
Integer as character: 1, 31
Integer as floating-point: 4.900000E+001
Floating-point as character: a, 61
Floating-point as integer: 97

With proper castings, the output gives correct values. Character A is converted to floating-point number 6.500000E+001 and integer 49 is converted to floating-point number 4.900000E+001. When a floating-point number is casted to char or int, its fractional part is simply removed. Integer 97 is exactly the decimal value of ASCII code of character a. Of course, the floating point numbers can be printed using fixed-point specifier %f.


Table of Contents

Previous: 1. An Overview of Computer Hardware and Software Systems

Next: 3. Basic Computer Operations