Basic Computing Concepts and Programming in C: Unit 2

Previous: 1. An Overview of Computer Hardware and Software Systems

2 Data Representations in Computers

The contents of computer storage are strings of 1's and 0's. Different types of data, e.g., characters, integers, and real numbers, have different representations in computers. In this chapter, we will give an overview of terminologies, called binary prefixes, that quantify computer storage, length, and time. The numeral system frequently used in a computer is not decimal numerals. The most three numeral systems are binary numerals, octal numerals, and hexadecimal numerals. We will explain theses three numeral systems and data representations of characters, integers, and real numbers. Finally, we describe primitive data types of C programming language.

2.1 Binary Prefix

The smallest unit of data storage in a computer is a bit. A single bit has a binary state which is state of 0 or 1, or a true or false, or two mutually exclusive states. A collection of bits, usually of length 8, is called a byte, which is the basic unit of computer storage. Another term "word" refers to a consecutive bytes with the length of CPU's largest register or largest addressable memory location. For example, for IBM-compatible PC's, a word is equivalent to 16 bits, i.e., 2 bytes, due to early Intel 80x86 processors. However, for many recent CPU's, a word is equivalent to 32 bits or even 64 bits.

There are binary prefixes to quantify large amount of bits, bytes, and words.

Name	Base 2	Base 10	Numeral	Name	Base 2	Base 10	Numeral
kilo	2¹⁰	»10³	thousand	peta	2⁵⁰	»10¹⁵	quardrillion
mega	2²⁰	»10⁶	million	exa	2⁶⁰	»10¹⁸	quintillion
giga	2³⁰	»10⁹	billion	zeta	2⁷⁰	»10²¹	sextillion
tera	2⁴⁰	»10¹²	trillion	yotta	2⁸⁰	»10²⁴	septillion

The first three names are mostly used in computer science. The term kilo-bytes means 2¹⁰ bytes, i.e., 1,024 bytes, about 1,000 bytes; the term mega-bytes means 2²⁰ bytes, i.e., 1,048,576 bytes, about 1,000,000 bytes; the term giga-bytes means 2³⁰ bytes, i.e., 1,073,741,824 bytes, about 1,000,000,000 bytes. The three terms are usually written in short as KB, MB, and GB, respectively. Hence, a 1.8 KB file means that the file is about of the size 1,800 bytes; a 128 MB RAM means that the memory is about of the size 128,000,000 bytes, a 40 GB hard disk means that the disk is about of the size 40,000,000,000 bytes.

There are also terminologies for quantifying small seconds of time, length, and weight. These terms are expressed in base 10.

Name	Base 10	Numeral	Name	Base 10	Numeral
milli	10^-3	thousandth	nano	10^-9	billionth
micro	10^-6	millionth	pico	10^-12	trillionth

One millisecond, 1 ms, is one thousandth of a second, i.e., 10^-3 seconds. One microsecond, 1 ms, is on millionth of a second, i.e., 10^-6 seconds. One nanosecond, 1 ns, is one billionth of a second, i.e., 10^-9 seconds.

Several terminologies are frequently used in computer science. We give some examples below:

Description	Notation	Meaning
network bandwidth	128 kbps	128 kilo bits per second
frequency of CPU clock	600 MHz	600 mega hertz = 600´10⁶ clock cycles per second; 1/(600´10⁶) seconds per clock cycle = 1.67 nanoseconds per clock cycle
microprocessor speed (in terms of instruction)	600 MIPS	600 mega (million) instructions per second
microprocessor speed (in terms of operation)	1.4 GFlops	1.4 giga (billion) floating point operations per second

2.2 Binary Numerals, Octal Numerals, and Hexadecimal Numerals

The numeral system we use daily is decimal numerals, i.e., base 10 numbers. In the decimal number system, there are ten symbols, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, with decimal points and plus (+) and minus (-) signs to express decimal numbers.

Three other numeral systems are used in computer science, binary numerals (base 2), octal numerals (base 8), and hexadecimal numerals (base 16), For binary numerals, there are only two digit symbols 0 and 1. Octal numerals are composed of 8 digit symbols, 0, 1, 2, 3, 4, 5, 6, and 7. Hexadecimal numerals are composed of 16 digit symbols, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. The English digits, A, B, C, D, E, and F, of hexadecimal numerals, denote decimal values 10, 11, 12, 13, 14, and 15, respectively. The following table shows the equivalent values of decimal number, binary numbers, octal numbers, and hexadecimal number from 0₁₀ to 15₁₀.

Dec.	Bin.	Oct.	Hex.	Dec.	Bin.	Oct.	Hex.
0	0000	00	0	8	1000	10	8
1	0001	01	1	9	1001	11	9
2	0010	02	2	10	1010	12	A
3	0011	03	3	11	1011	13	B
4	0100	04	4	12	1100	14	C
5	0101	05	5	13	1101	15	D
6	0110	06	6	14	1110	16	E
7	0111	07	7	15	1111	17	F

A decimal number, say 25084, is actually an expression of powers of 10, i.e.,

25084=2´10⁴+5´10³+0´10²+8´10¹+4´10⁰.

Similarly, a binary number, say 101101₂ is actually an expression of powers of 2, i.e.,

101101₂=1´2⁵+0´2⁴+1´2³+1´2²+0´2¹+1´2⁰=45.

Note that when there is a chance of ambiguity, a subscript is used to indicate the base of a number. Often, terms with coefficient 0 are omitted and 101101₂ is written as:

101101₂=1´2⁵+1´2³+1´2²+1´2⁰.

An octal number, say 763₈, is an expression of powers of 8, i.e.,

763₈=7´8²+6´8¹+3´8⁰=499.

A hexadecimal number, say 9E4B₁₆, is an expression of powers of 16, i.e.,

9E4B₁₆=9´16³+14´16²+4´16¹+11´16⁰=40523.

For a number with a fixed length of digits, leading zeros will be added if the length of this number is shorter than the given fixed length. For example, if we designate a binary number of 8 bits, then 101101₂ is often written as 00101101₂. We show examples of converting a number from a numeral system to the other numeral systems.

Example 1: Convert decimal number 40206 to a binary number, an octal number, and a hexadecimal number.

The stepwise conversion to a binary number is illustrated as below:

Input: decimal number d = 40206. Output: binary number b of the same value as d. Let b be a "NULL" binary value. Divided d by 2. Attach the remainder on the left-hand-side of b and set d to be the quotient. If d ³ 2, repeat Step 2; otherwise, attach d on the left-hand-side of b.
Operational steps:	Initial: (d, b) = (40206, )
40206 ¸ 2 = 20103 ×××××× 0	(20103, 0)
20103 ¸ 2 = 10051 ×××××× 1	(10051, 10)
10051 ¸ 2 = 5025 ×××××× 1	(5025, 110)
5025 ¸ 2 = 2512 ×××××× 1	(2512, 1110)
2512 ¸ 2 = 1256 ×××××× 0	(1256, 01110)
1256 ¸ 2 = 628 ×××××× 0	(628, 001110)
628 ¸ 2 = 314 ×××××× 0	(314, 0001110)
314 ¸ 2 = 157 ×××××× 0	(157, 00001110)
157 ¸ 2 = 78 ×××××× 1	(78, 100001110)
78 ¸ 2 = 39 ×××××× 0	(39, 0100001110)
39 ¸ 2 = 19 ×××××× 1	(19, 10100001110)
19 ¸ 2 = 9 ×××××× 1	(9, 110100001110)
9 ¸ 2 = 4 ×××××× 1	(4, 1110100001110)
4 ¸ 2 = 2 ×××××× 0	(2, 01110100001110)
2 ¸ 2 = 1 ×××××× 0	(1, 001110100001110)
\ 40206₁₀=1001 1101 0000 1110₂

The stepwise conversion to an octal number is illustrated as below:

Input: decimal number d = 40206. Output: octal number o of the same value as d. Let o be a "NULL" octal value. Divided d by 8. Attach the remainder on the left-hand-side of o and set d to be the quotient. If d ³ 8, repeat Step 2; otherwise, attach d on the left-hand-side of o.
Operational steps:	Initial: (d, o) = (40206, )
40206 ¸ 8 = 5025 ×××××× 6	(5025, 6)
5025 ¸ 8 = 628 ×××××× 1	(628, 16)
628 ¸ 8 = 78 ×××××× 4	(78, 416)
78 ¸ 8 = 9 ×××××× 6	(9, 6416)
9 ¸ 8 = 1 ×××××× 1	(1, 16416)
\ 40206₁₀=116 416₈

The stepwise conversion to a hexadecimal number is illustrated as below:

Input: decimal number d = 40206. Output: hexadecimal number h of the same value as d. Let h be a "NULL" binary value. Divided d by 16. Attach the remainder on the left-hand-side of h and set d to be the quotient. If d ³ 16, repeat Step 2; otherwise, attach d on the left-hand-side of h.
Operational steps:	Initial: (d, h) = (40206, )
40206 ¸ 16 = 2512 ×××××× 14	(2512, E)
2512 ¸ 16 = 157 ×××××× 0	(157, 0E)
157 ¸ 16 = 9 ×××××× 13	(9, D0E)
\ 40206₁₀=9D0E₁₆

If the binary number 1001110100001110 is grouped into 3 bits each from left to right, 1 001 110 100 001 110, it is easy to convert each group of three bits to an octal digit and obtain 116416₈. Similarly, if the binary number 1001110100001110 is grouped into 4 bits each from left to right, 1001 1101 0000 1110, it is easy to convert each group of four bits to a hexadecimal digit and obtain 9D0E₁₆. Conversely, it is possible to convert a decimal into an octal or a hexadecimal number and then convert each octal digit or hexadecimal digit into binary bits. This procedure requires fewer steps to convert a large decimal number to a binary one.

Similarly to decimal arithmetic operations, we can add and multiply a pair of binary numbers, octal numbers, and hexadecimal numbers. Addition of two binary numbers, 10010010+00111011, is shown as the following:

carry			1	1			1
		1	0	0	1	0	0	1	0
	+	0	0	1	1	1	0	1	1
		1	1	0	0	1	1	0	1

Addition of two octal numbers, 2614+1356, is similar:

carry		1		1
		2	6	1	4
	+	1	3	5	6
		4	1	7	2

Let us examine the addition operation 2614₈+1356₈= 1420₁₀ + 750₁₀ = 2170₁₀ = 4172₈. We leave addition of two hexadecimal numbers as exercise.

Multiplication of two binary numbers, 10110´1010, is shown as the following:

				1	0	1	1	0
´					1	0	1	0
				0	0	0	0	0
			1	0	1	1	0
		0	0	0	0	0
	1	0	1	1	0
	1	1	0	1	1	1	0	0

Multiplication of octal numbers, we first consider products of two single octal digits, e.g., 5₈´7₈=43₈, 6₈´7₈=52₈, and 3₈´7₈=25₈, and products of multiple digits and a single digit, e.g., 563₈´7₈=4300₈+520₈+25₈=5045₈. Multiplication of two octal numbers 563´247 is illustrated as below:

				5	6	3
´				2	4	7
			5	0	4	5
		2	7	1	4
	1	3	4	6
	1	7	1	0	0	5

Binary numbers, octal numbers, and hexadecimal numbers with decimal points are similar to that of decimal numbers. For example, binary fractional number 11.101 is converted to a decimal fractional number as below:

11.101₂=1´2¹+1´2⁰+1´2^-1+0´2^-2+1´2^-3=2+1+0.5+0.125=3.625₁₀.

The conversion of a decimal fractional number to a binary number is done in two steps. First, convert the integer part, and then convert the fractional part. For example, the conversion of 3.625₁₀ to a binary fractional number is done by first converting the integer part 3₁₀ to to 11₂ as before. The second step is to convert the fractional part 0.625₁₀ to a binary fraction. The fraction conversion is described and illustrated below.

Input: decimal fraction d = 0.625. Output: binary fraction b of the same value as d. Let b be 0. binary value. Multiply d by 2. Attach the digit in the integer part (either 0 or 1) to the right-hand-side of b and set d to be fractional part of the product. If d ¹ 0, repeat Step 2; otherwise, stop.
Operational steps:	Initial: (d, b) = (0.625, 0.)
0.625 ´ 2 = 1.25	(0.25, 0.1)
0.25 ´ 2 = 0.5	(0.5, 0.10)
0.5 ´ 2 = 1.0	(0.0, 0.101)
\ 0.625₁₀=0.101₂

Therefore, decimal number 3.625₁₀ is converted to 11.101₂. However, conversion of decimal fractions to binary fractions is not always perfect. Let us try to convert 0.8₁₀ to a binary fraction.

Operational steps:	Initial: (d, b) = (0.8, 0.)
0.8 ´ 2 = 1.6	(0.6, 0.1)
0.6 ´ 2 = 1.2	(0.2, 0.11)
0.2 ´ 2 = 0.4	(0.4, 0.110)
0.4 ´ 2 = 0.8	(0.8, 0.1100)
0.8 ´ 2 = 1.6	(0.6, 0.11001)
0.6 ´ 2 = 1.2	(0.2, 0.110011)
0.2 ´ 2 = 0.4	(0.4, 0.1100110)
0.4 ´ 2 = 0.8	(0.8, 0.11001100)
¼	¼
0.4 ´ 2 = 0.8	(0.8, 0.110011001100¼1100)
\ 0.8₁₀=0.110011001100¼₂ (a cyclic binary fraction)

Practically, a decimal fraction is converted to a binary fraction of its approximation with a specific length. For example, if the binary fraction is fixed to 8 bits, 0.8₁₀is converted to 0.11001100₂which is actually 0.796875₁₀, an approximation of 0.8₁₀.

Binary numbers with the negative sign are expressed in a specific number of bits, too. We will consider 8-bit signed binary numbers. A positive binary number, say, 37₁₀, is expressed as 00100101₂. There are two methods for representing negative binary numbers. The first representation is one's complement which is the bitwise NOT of a binary number, e.g., -37₁₀= 00100101₂= 11011010₂. For 8-bit signed numbers using one's complement, only integers ranges from -127₁₀ (10000000₂) to 127₁₀ (01111111₂) can be expressed. For example, 320₁₀ cannot be expressed as an 8-bit signed binary number, since 320₁₀= 1 0100 0000₂ which requires at least 9 bits. In one's complement, the most significant bit is used as the sign bit. If the most significant bit is 0, the binary number is a positive number; if the most significant bit is 1, the binary number is a negative number. One's complement has a drawback of two zero's, 00000000₂ and 11111111₂.

The second representation of signed binary numbers is two's complement. For an 8-bit binary number, the most significant bit is used as the sign bit. The representation of a negative binary number using two's complement is taking the one's complement of its absolute value then adding one to it. For example, -37₁₀= 00100101₂ + 1₂= 11011010₂+ 1₂= 11011011₂. For 8-bit signed numbers using two's complement ranges from -128₁₀ (10000000₂) to 127₁₀ (01111111₂). Note that there is no 128₁₀ of 8-bit two's complement binary numbers.

Two's complement representation is suitable for binary number arithmetic. Consider 8-bit signed binary numbers. Addition of two positive numbers is the same as the previous case 27₁₀+ 12₁₀= 00011011₂+ 00001100₂= 00100111₂. Additions of negative numbers are illustrated as below:

carry	1	1	1	1
		0	0	0	1	1	0	1	1	(27₁₀)
	+	1	1	1	1	0	1	0	0	(-12₁₀=00001100₂+1₂=11110011₂+1₂=11110100₂)
		0	0	0	0	1	1	1	1	(15₁₀)

carry					1	1
		1	1	1	0	0	1	0	1	(-27₁₀=00011011₂+1₂=11100100₂+1₂=11100101₂)
	+	0	0	0	0	1	1	0	0	(12₁₀)
		1	1	1	1	0	0	0	1	(-15₁₀=00001111₂+1₂=11110000₂+1₂=11110001₂)

carry	1	1	1			1
		1	1	1	0	0	1	0	1	(-27₁₀)
	+	1	1	1	1	0	1	0	0	(-12₁₀)
		1	1	0	1	1	0	0	1	(-39₁₀=00100111₂+1₂=11011000₂+1₂=11011001₂)

Subtraction of two positive numbers is first converting the second operand using two's complement than adding the two numbers. For example,

27₁₀-12₁₀=27₁₀+(-12₁₀)=00011011₂+11110100₂=00001111₂=15₁₀.

However, the following addition results in an error of overflow. It is considered as an invalid operation.

carry		1					1
		0	1	0	0	1	0	1	1	(75₁₀)
	+	0	1	0	1	0	0	1	0	(82₁₀)
		1	0	0	1	1	1	0	1	(Overflow! 10011101₂ is a negative number -99₁₀.)

Multiplication of negative numbers is to convert negative operand(s) to its/their absolute values, to calculate the product of two positive numbers, and then to convert the product back to its two's complement representation if only one of the operands is negative.

2.3 Characters

Characters are represented in a computer as a bit string. The most popular encoding scheme of English characters is ASCII encoding (American Standard Code for Information Interchange). Each character of ASCII code occupies one byte, but only seven bits are used. The most significant bit of ASCII code is always 0. The character set ASCII code is divided into two categories: control characters (in green color) and printable characters (in blue color). The printable characters range from decimal values 32 to 126. The table of ASCII code is shown as below.

Dec.	Oct.	Hex.	Bin.	Character	Description
0	000	00	00000000	NUL	Null character
1	001	01	00000001	SOH	Start of header
2	002	02	00000010	STX	Start of text
3	003	03	00000011	ETX	End of text
4	004	04	00000100	EOT	End of transmission
5	005	05	00000101	ENQ	Enquiry
6	006	06	00000110	ACK	Acknowledgement
7	007	07	00000111	BEL	Bell
8	010	08	00001000	BS	Backspace
9	011	09	00001001	HT	Horizontal tab
10	012	0A	00001010	LF	Line feed
11	013	0B	00001011	VT	Vertical tab
12	014	0C	00001100	FF	Form feed
13	015	0D	00001101	CR	Carriage return
14	016	0E	00001110	SO	Shift out
15	017	0F	00001111	SI	Shift In
16	020	10	00010000	DLE	Data link escape
17	021	11	00010001	DC1	Device control 1 (XON)
18	022	12	00010010	DC2	Device control 2
19	023	13	00010011	DC3	Device control 3
20	024	14	00010100	DC4	Device control 4 (XOFF)
21	025	15	00010101	NAK	Negative acknowledgement
22	026	16	00010110	SYN	Synchronous idle
23	027	17	00010111	ETB	End of transfer block
24	030	18	00011000	CAN	Cancel
25	031	19	00011001	EM	End of medium
26	032	1A	00011010	SUB	Substitute
27	033	1B	00011011	ESC	Escape
28	034	1C	00011100	FS	File separator
29	035	1D	00011101	GS	Group separator
30	036	1E	00011110	RS	Return to send; record separator
31	037	1F	00011111	US	Unit separator
32	040	20	00100000	SP	Space
33	041	21	00100001	!
34	042	22	00100010	"
35	043	23	00100011	#
36	044	24	00100100	$
37	045	25	00100101	%
38	046	26	00100110	&
39	047	27	00100111	'
40	050	28	00101000	(
41	051	29	00101001	)
42	052	2A	00101010	*
43	053	2B	00101011	+
44	054	2C	00101100	,
45	055	2D	00101101	-
46	056	2E	00101110	.
47	057	2F	00101111	/
48	060	30	00110000	0
49	061	31	00110001	1
50	062	32	00110010	2
51	063	33	00110011	3
52	064	34	00110100	4
53	065	35	00110101	5
54	066	36	00110110	6
55	067	37	00110111	7
56	070	38	00111000	8
57	071	39	00111001	9
58	072	3A	00111010	:
59	073	3B	00111011	;
60	074	3C	00111100	<
61	075	3D	00111101	=
62	076	3E	00111110	>
63	077	3F	00111111	?
64	100	40	01000000	@
65	101	41	01000001	A
66	102	42	01000010	B
67	103	43	01000011	C
68	104	44	01000100	D
69	105	45	01000101	E
70	106	46	01000110	F
71	107	47	01000111	G
72	110	48	01001000	H
73	111	49	01001001	I
74	112	4A	01001010	J
75	113	4B	01001011	K
76	114	4C	01001100	L
77	115	4D	01001101	M
78	116	4E	01001110	N
79	117	4F	01001111	O
80	120	50	01010000	P
81	121	51	01010001	Q
82	122	52	01010010	R
83	123	53	01010011	S
84	124	54	01010100	T
85	125	55	01010101	U
86	126	56	01010110	V
87	127	57	01010111	W
88	130	58	01011000	X
89	131	59	01011001	Y
90	132	5A	01011010	Z
91	133	5B	01011011	[
92	134	5C	01011100	\
93	135	5D	01011101	]
94	136	5E	01011110	^
95	137	5F	01011111	_
96	140	60	01100000	`
97	141	61	01100001	a
98	142	62	01100010	b
99	123	63	01100011	c
100	144	64	01100100	d
101	145	65	01100101	e
102	146	66	01100110	f
103	147	67	01100111	g
104	150	68	01101000	h
105	151	69	01101001	i
106	152	6A	01101010	j
107	153	6B	01101011	k
108	154	6C	01101100	l
109	155	6D	01101101	m
110	156	6E	01101110	n
111	157	6F	01101111	o
112	160	70	01110000	p
113	161	71	01110001	q
114	162	72	01110010	r
115	163	73	01110011	s
116	164	74	01110100	t
117	165	75	01110101	u
118	166	76	01110110	v
119	167	77	01110111	w
120	170	78	01111000	x
121	171	79	01111001	y
122	172	7A	01111010	z
123	173	7B	01111011	{
124	174	7C	01111100	\|
125	175	7D	01111101	}
126	176	7E	01111110	~
127	177	7F	01111111	DEL	Delete

A computer does not process only English text that it must be able to deal with text in other languages such as Chinese, Japanese, and Spanish. To accommodate other languages, various character encoding schemes have been developed. For examples, two Chinese character encoding schemes are used in different area, Big5 for original Chinese characters, GB for simplified Chinese characters. Big5 and GB are all 16-bit encoding schemes.

Unicode is an international standard of character encoding of all human written languages developed by International Organization for Standardization (ISO). Unicode has been adopted by many computer systems to support multilingual environments, including Microsoft windows such as Windows NT, Windows 2000, Windows XP, Unix based operating system such as GNU/Linux, BSD, Mac OS X. Unicode is designed as both 8-bit, UTF-8, and 16-bit, UTF-16, coding schemes for different languages.

2.4 Integers

In mathematics, the set of integers have infinite number of elements. However, a computer has only finite resource, a given size of memory storage, that it is not possible to represent all integers in a computer. There are two basic types of integer representations: unsigned and signed integers.

Unsigned integers are non-negative integers and are represented in their binary values with a fixed number of bits. For Intel 80x86 processors, a word is 16 bit long and an integer is represented as a 16-bit binary numbers. However, the 16-bit string is too long to read. For easy reading, a 16-bit binary number is usually written as a four-digit hexadecimal number. For example, 5876 is represented as binary number 0001 0110 1111 0100 and is written as hexadecimal number 16F4. Another example, 39741 is represented as binary number 1001 1011 0011 1101 and hexadecimal number 9B3D. Note that the most significant bit of 39741 is one, but it is a positive integer. For 16-bit unsigned integers, the values range from 0 to 2¹⁶-1 (65535₁₀).

Signed integers are zero, negative, and positive integers and are represented using two's complement. For a 16-bit processor, the most significant bit is used as the signed bit. The values a signed integer range from -2¹⁵ (-32768₁₀) to 2¹⁵-1 (32767₁₀). Usually, the representation of a signed integer is expressed in hexadecimal numbers for easy reading. Positive integer 5876₁₀ is still represented as 16F4₁₆. However, 39741₁₀ is too large to be represented as a 16-bit signed integer; its corresponding hexadecimal number 9B3D₁₆ is indeed the two's complement of signed negative integer -25795₁₀.

2.5 Real Numbers

Real numbers are represented in a computer using two method: fixed-point format and floating-point format. In a computer, it is not possible to represent a real number precisely. For example, 1/3 is 0.3333.... that cannot be exactly stored in a finite length of memory storage. Both fixed-point numbers and floating-point numbers are only approximation of real numbers.

The representation of fixed-point numbers uses a specific number of bits that part of the bits are used to store the integer part and the rest of the bits are used to store the fractional part after the decimal point. Consider 16-bit representation of fixed-point numbers that the higher eight bits are the integer part and the lower eight bits are the fractional part. For example, fixed-point number 0100 0011 1001 0100 is the representation of

01000011.10010100₂ = 2⁶ + 2¹ + 2⁰ + 2^-1 + 2^-4 + 2^-6

= 64 + 2 + 1 + 0.5 + 0.0625 + 0.015625 = 67.578125₁₀

For signed fixed-point numbers, a bit is used to represent the plus and minus sign. In the case of 16-bit processors, only 15 bits are left to store the integer and fractional parts, e.g., 7 bits for the integer part and 8 bits for the fractional part.. To express a decimal numbers in the fixed-point format, the decimal real number is converted to a binary number with decimal points with the specified bit length. The decimal real number -18.625₁₀ is converted to -10010.101₂ and its fixed-point representation is 1 0010010 10100000.

A floating-point number contains three parts: a sign, an exponent, and a fraction. Floating point numbers are written in the scientific notation: ±fractionEexponent. For examples, 1.432456E5 means 1.432456´10⁵=143245.6 and 3.214065E-3 means 3.214065´10^-3=0.003214065.

In a computer, floating point number representation uses scientific notation of the binary number system. A popular standard of floating-point number representation is IEEE 754. The 32-bit single precision floating-point format of IEEE 754 is shown as below:

exponent

fraction

e₇

e₆

e₅

e₄

e₃

e₂

e₁

e₀

f₂₂

f₂₁

f₂₀

f₁₉

f₁₈

f₁₇

f₁₆

f₁₅

f₁₄

f₁₃

f₁₂

f₁₁

f₁₀

f₉

f₈

f₇

f₆

f₅

f₄

f₃

f₂

f₁

f₀

where the most significant bit (bit 31) denotes the sign bit, bits 23 to 31 are exponent bits, and bits 0 to 22 are fraction bits. These three parts have the following meanings:

The sign bit is either 0 or 1 that denotes positive or negative floating point numbers, respectively.
The 8-bit exponent (bits 23 to 31) has values ranged from 1 to 254 (0 and 255 are not used) with bias 127, i.e., the actual exponent value is the value of the 8-bit exponent e₇e₆e₅e₄e₃e₂e₁e₀ subtracted by 127.
The 23-bit fraction (bits 0 to 22) is the value of a binary fraction, i.e., the fractional value has the integer part of value 1 and the fractional part is f₂₂f₂₁f₂₀f₁₉f₁₈f₁₇f₁₆f₁₅f₁₄f₁₃f₁₂f₁₁f₁₀f₉f₈f₇f₆f₅f₄f₃f₂f₁f₀.

Let sing(s) be the plus sign (+), if the sign bit is 0, and be the minus sign (-), if the sign bit is 1. Let e be the binary number e₇e₆e₅e₄e₃e₂e₁e₀. Also, let f be the binary fractional number 0.f₂₂f₂₁f₂₀f₁₉f₁₈f₁₇f₁₆f₁₅f₁₄f₁₃f₁₂f₁₁f₁₀f₉f₈f₇f₆f₅f₄f₃f₂f₁f₀. The floating point number of 32-bit IEEE 754 standard is calculated as

sign(s) (1 + f) ´ 2^{e
- 127}.

The value of 0 01111011 01100000000000000000000 is

1.011₂ ´ 2^123-127 = 1.011₂ ´ 2^-4 = 0.0001011₂ = 0.0859375₁₀,

where the exponent 01111011₂ equals to 123₁₀. The value of 1 10000101 10110000000000000000000 is

-1.1011₂ ´ 2^133-127 = -1.1011₂ ´ 2⁶ = -1101100₂ = -108₁₀,

where the exponent 10000101₂ equals to 133₁₀.

To express a decimal numbers in the floating-point format, the decimal number is converted to a binary number with decimal points. Then calculate the exponent part and the fraction part. Consider the decimal number -18.625₁₀.

-18.625₁₀= -10010.101₂ = -1.0010101₂ ´ 2⁴.

We obtain the results that the sign bit is 1, the exponents is 4 +127 = 131₁₀ = 10000011₂, and the fraction is 0010101. The floating-point representation of -18.625 is written as 1 10000011 00101010000000000000000. If both the exponent part and the fraction part are 0, the floating point number is 0.

2.6 Primitive Data Types in C

A C program contains a number of variables each of which represents a memory location. (Issues on program variables will be discussed in Section 3.3.) Each program variable must be declared as an instance of a data type. Three basic types are supported in C programming language, characters, integers, and floating-point numbers. Integer and floating-point types have several variations. The primitive data types, sizes (in bytes), and value rages are listed in the following table:

Data type	Description	Size	Range
char	character	1	-128 to 127
int	integer	4	-2,147,483,648 to 2,147,483,647
short	short integer	2	-32,768 to 32,767
long	long integer	4	-2,147,483,648 to 2,147,483,647
long long	long integer	8	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
unsigned char	unsigned character	1	0 to 255
unsigned	unsigned integer	4	0 to 4,294,967,295
unsigned short	unsigned short integer	2	0 to 65,535
unsigned long	unsigned long integer	4	0 to 4,294,967,295
unsigned long long	unsigned long long integer	8	0 to 18,446,744,073,709,551,615
float	single precision floating-point number	4	±1.17549435´10^-38 to ±3.40282347´10³⁸
double	double precision floating-point number	8	±2.2250738585072014´10^-308 to ±1.7976931348623157´10³⁰⁸

The size of a data type depends on the size of a computer's CPU. The sizes in the above table may vary from one computer to another. The following C program size_of.c outputs the size of each data type using a C library function sizeof() which receives a type name and returns the number of bytes of that type.

#include <stdio.h>

int main(void) {

printf("Memory size of C primitive data types (bytes): \n");
printf(" char: %d\n", sizeof(char));
printf(" int: %d\n", sizeof(int));
printf(" short: %d\n", sizeof(short));
printf(" long: %d\n", sizeof(long));
printf(" long long: %d\n", sizeof(long long));
printf(" unsigned: %d\n", sizeof(unsigned));
printf(" unsigned short: %d\n", sizeof(unsigned short));
printf(" unsigned long: %d\n", sizeof(unsigned long));
printf(" unsigned long long: %d\n", sizeof(unsigned long long));
printf(" float: %d\n", sizeof(float));
printf(" double: %d\n", sizeof(double));

return 0;
}

The output of size_of.c is shown as below:

Memory size of C primitive data types (bytes):
char: 1
int: 4
short: 2
long: 4
long long: 8
unsigned: 4
unsigned short: 2
unsigned long: 4
unsigned long long: 8
float: 4
double: 8

Character Data Type

In C programs, a character constant is expressed with a pair of single quotation, e.g., 'A', 'a', ';', etc., and is encoded using ASCII encoding scheme. Most of the printable characters can appear as a character by itself. However, a small set of characters must be attached a backslash symbol at the front to form a character in order to avoid ambiguity. A character with a backslash at the front is called an escape sequence. Escape sequences including some control characters of C programming language is listed below:

Escape sequence	name	mean
\a	Alert	output an audio or visible alert signal.
\b	Backspace	move the cursor back one position without removing the character.
\f	Form feed	move the cursor to the beginning to the next page.
\n	New line	move the cursor to the beginning of the next line.
\r	Carriage return	move the cursor to the beginning of the current line.
\t	horizontal tab	curve the cursor to the next horizontal tabular position.
\v	vertical tab	curve the cursor to the next vertical tabular position.
\'		output a single quote.
\"		output a double quote.
\?		output a question mark.
\\		output a backslash
\0	Null	output a null character.
\ddd		define a character with octal digits, where ddd is an octal number between 0 and 127₁₀.
\xdd		define a character with hexadecimal digits, where dd is a hexadecimal number between 0 and 127₁₀.

In C programs, a string is declared as an array of multiple characters ending with a null character. (Arrays will be discussed in details in Section 5.2.) Program characters.c manipulates a string and prints out the original and new string. Line 4 declares string as an array of 10 characters and assign the initial values as ABC. Line 6 outputs the original string. Lines 7 through 13, change the contents of string.

#include <stdio.h>

int main(void) {
char string[10] = {'A', 'B', 'C', '\0'};

printf("The original string: %s\n", string);
string[3] = '\t';   // horizontal tab
string[4] = '\\';   // backslash
string[5] = '\"';   // double quote
string[6] = 'X';    // character X
string[7] = '\131'; // character Y
string[8] = '\x5A'; // character Z
string[9] = '\0';   // null
printf("The new string: %s\n", string);

return 0;
}

The original string: ABC
The new string: ABC \"XYZ

Octal number 131₈is the ASCII code of character Y and hexadecimal number 5A₁₆ is the ASCII code of character Z. Hence, Lines 11 and 12 are equivalent to:

string[7] = 'Y';

string[8] = 'Z';

In C programming language, characters are treated as integers, i.e., it is possible to do arithmetic operation on characters. Hence, Lines 11 and 12 can be written with character arithmetic operations:

string[7] = string[6] + 1;

string[8] = string[7] + 1;

The content of character array string is shown below:

array index	0	1	2	3	4	5	6	7	8	9
original string	A	B	C	\0
new string	A	B	C	\t	\	"	X	Y	Z	\0

The ASCII code has the values ranged from 0 to 127. Since a character is of size 8 bit, its values are ranged from -128 to 127. In another words, it is possible to store multi-byte characters, such as Chinese characters, in a character string. However, C programming language does not allow direct assignment of a multi-byte character to a character variable or array. For example the assignment

char c = '中';

will cause compiler warnings (in the gray color background) using Dev-C⁺⁺:

[Warning] multi-character character constant

[Warning] overflow in implicit constant conversion

To read multi-byte character strings, a program can use scanf() or functions in standard library string.h. Program character_big5_encoding.c reads a string and outputs the hexadecimal values of each character code.

#include <stdio.h>

int main(void) {
unsigned char string[13];
int i;

scanf("%s", &string);
for (i=0; string[i] != '\0'; i++)
printf("%X ", string[i]);
printf("\n");

return 0;
}

資訊工程學系
B8 EA B0 54 A4 75 B5 7B BE C7 A8 74

From the input string and output codes, it is easy to observe that 資 is encoded to B8EA, 訊 is encoded to B054, 工 is encoded to A475, 程 is encoded to B57B, 學 is encoded to BEC7, and 系 is encoded to A874, using Big5 encoding scheme. This encoding can be verified using program character_big5_decoding.c:

#include <stdio.h>

int main(void) {
unsigned char string[13];

string[0] = '\xB8';
string[1] = '\xEA';
string[2] = '\xB0';
string[3] = '\x54';
string[4] = '\xA4';
string[5] = '\x75';
string[6] = '\xB5';
string[7] = '\x7B';
string[8] = '\xBE';
string[9] = '\xC7';
string[10] = '\xA8';
string[11] = '\x74';
string[12] = '\0';
printf("%s\n", string);

return 0;
}

資訊工程學系

Integer Data Type

Integer types in C have many variations. The most commonly used type is int. All signed integer types are stored using two's complement and the size and range are listed in the table of primitive types. Program integer_max_min.c shows the maximum and minimum values of type int.

#include <stdio.h>

int main(void) {
int i;

i = 2147483647;
printf("maximum integer value: %d\n", i);
i = 2147483648;
printf("integer overflow: %d\n", i);
i = -2147483648;
printf("minimum integer value: %d\n", i);
i = -2147483649;
printf("integer underflow: %d\n", i);

return 0;
}

maximum integer value: 2147483647
integer overflow: -2147483648
minimum integer value: -2147483648
integer underflow: 2147483647

The next program integer_short_long.c shows that assigning a variable of the normal integer type to a variable of the short integer type (Line 10) may cause the overflow problem. But, it is not a problem of assigning a variable of the normal integer type to a variable of the long integer type (Line 12) is allowed.

#include <stdio.h>

int main(void) {
int i;
short j;
long k;

i = 158423;
printf("normal integer: %d\n", i);
j = i;
printf("short integer (overflow): %d\n", j);
k = i;
printf("long integer: %d\n", k);

return 0;
}

normal integer: 158423
short integer (overflow): 27351
long integer: 158423

Floating-point Data Type

Floating-point numbers in C program can be printed in the fixed-point format or floating-point format. Program float_format.c shows the output of 134.56789 in fixed-point format (Line 7) and floating-point format (Line 8). Also, note that the output values are not exactly 134.5678 since floating-point representation is only an approximation. This fact will make equality test of floating numbers more complicated. We will discuss this issue in Section 4.2.

#include <stdio.h>

int main(void) {
float r;

r = 134.56789;
printf("fixed-point format: %f\n", r);
printf("floating-point format: %E\n", r);

return 0;
}

fixed-point format: 134.567886
floating-point format: 1.345679E+002

Additional format specification can be added to an integer specifier or a floating-point specifier to make the output more friendly. For example, %4d will output an integer occupying at least four character positions by filling spaces at the left-hand-side, if the length of the integer (including sign symbol) is less than 4; the output will occupy whatever the length of the integer, if this length is greater than or equal to four. Specifier %6.2f will print a floating point number of minimum width six positions such that at least three integer digits, the decimal point, and two fractional digits (two-digit precision). Program specifier.c shows some examples of length and precision format. Lines 5 and 6 produce the first two output lines which have two spaces at the beginning and Line 7 produces the third output line which occupies five character positions. Lines 8 and 9 generates the fourth and fifth output lines which has a space at the beginning and each floating-point number has two fractional digits. The output of Line 10 has total seven characters.

#include <stdio.h>

int main(void) {

printf("%4d\n", 12);
printf("%4d\n", -8);
printf("%4d\n", 12345);
printf("%6.2f\n", -2.5);
printf("%6.2f\n", 32.533);
printf("%6.2f\n", 3232.533);

return 0;
}

12
-8
12345
-2.50
32.53

3232.53

In the theory of programming languages, a strongly typed language allows a compiler to check type errors during compile time. However, C programming language is not strictly confined to strongly type checking for it allows assignment of variables of different type to each other. Program primitve_type_not_casting.c tries to print variables of character type, integer type, and floating-point type. The program goes through compilation without any error, but the output shows some of the values have incorrect results.

#include <stdio.h>

int main(void) {
char c;
int i;
float r;

c = 'A';
printf("Character as integer: %d\n", c);
printf("Character as floating-point: %E\n", c);
i = 49;
printf("Integer as character: %c, %X\n", i, i);
printf("Integer as floating-point: %E\n", i);
r = 97.512;
printf("Floating-point as character: %c, %X\n", r, r);
printf("Floating-point as integer: %d\n", r);

return 0;
}

The output shows printing a character as an integer (Line 9) or an integer as a character (Line 12) yields a correct answer. However, printing a character or an integer as a floating-point number (Lines 10 and 13) or printing a floating-point number as a character or an integer (Lines 15 and 16) will yield and incorrect answer.

Character as integer: 65
Character as floating-point: 5.284007E-308
Integer as character: 1, 31
Integer as floating-point: 1.039778E-312
Floating-point as character: , 405860C4

Floating-point as integer: -1610612736

To ensure program correctness, proper type casting (鑄型) must be added in front of a variable. Type casting is explicitly to add the name of the intentional type in front of a variable or an expression that it will converted to. For example, in Lines 10 and 12 of program primitve_type_casting.c, character variable c and integer variable i are explicitly casted to type float. In Lines 15 and 16, float-point variable r is casted to type char and int.

#include <stdio.h>

int main(void) {
char c;
int i;
float r;

c = 'A';
printf("Character as integer: %d\n", c);
printf("Character as floating-point: %E\n", (float) c);
i = 49;
printf("Integer as character: %c, %X\n", i, i);
printf("Integer as floating-point: %E\n", (float) i);
r = 97.512;
printf("Floating-point as character: %c, %X\n", (char) r, (char) r);
printf("Floating-point as integer: %d\n", (int) r);

return 0;
}

The output of program primitve_type_casting.c is shown as below:

Character as integer: 65
Character as floating-point: 6.500000E+001
Integer as character: 1, 31
Integer as floating-point: 4.900000E+001
Floating-point as character: a, 61
Floating-point as integer: 97

With proper castings, the output gives correct values. Character A is converted to floating-point number 6.500000E+001 and integer 49 is converted to floating-point number 4.900000E+001. When a floating-point number is casted to char or int, its fractional part is simply removed. Integer 97 is exactly the decimal value of ASCII code of character a. Of course, the floating point numbers can be printed using fixed-point specifier %f.

Table of Contents

Previous: 1. An Overview of Computer Hardware and Software Systems

Next: 3. Basic Computer Operations

				1	0	1	1	0
´					1	0	1	0
				0	0	0	0	0
			1	0	1	1	0
		0	0	0	0	0
	1	0	1	1	0
	1	1	0	1	1	1	0	0

				5	6	3
´				2	4	7
			5	0	4	5
		2	7	1	4
	1	3	4	6
	1	7	1	0	0	5

				1	0	1	1	0
´					1	0	1	0
				0	0	0	0	0
			1	0	1	1	0
		0	0	0	0	0
	1	0	1	1	0
	1	1	0	1	1	1	0	0

				5	6	3
´				2	4	7
			5	0	4	5
		2	7	1	4
	1	3	4	6
	1	7	1	0	0	5

				1	0	1	1	0
´					1	0	1	0
				0	0	0	0	0
			1	0	1	1	0
		0	0	0	0	0
	1	0	1	1	0
	1	1	0	1	1	1	0	0

				5	6	3
´				2	4	7
			5	0	4	5
		2	7	1	4
	1	3	4	6
	1	7	1	0	0	5