Basic Computing Concepts and Programming in C: Unit 3

Most modern computers store data and programs in memory. Computers of this model are called stored-program computers and called von Neumann architecture originally devised by John von Neumann. The block diagram of a von Neumann architecture is composed of a memory and a central processing unit (CPU) which are connected by a bus. Programs and data are stored in memory. During program execution time, program instructions and data items are loaded to CPU registers and then program instructions are executed by arithmetic and logical unit (ALU) to manipulate data. After data items being processed, their results are stored back to the data area in memory. This chapter will explain CPU and memory in terms of their high-level structure and how their are instrumented in high-level programming languages such as C.

A register is a small memory space of fixed-length. For example, the registers of a 16-bit processor are 16 bits long. A data register is also called an accumulator which holds data that is processed by an ALU. A CPU may have more than one accumulators. An address register hold holds memory address that allows CPU to load data from a designated memory location. A special kind of address register is an index register which holds the index of an array or a vector. An instruction register holds an encoded instruction. The instruction is decoded by the control unit and then executed by ALU. The content of a program counter is the memory location where the instruction is stored.

The content of a data register is loaded from memory or set to a constant. For example, for Intel 80x86 processors (Intel 80386), set hexadecimal value 61₁₆ to register a1 is encoded to an instruction of a machine language as:

A machine instruction is what the content of an instruction register; but, it is hard for human to read. A readable notation, called assembly language, is often used to express an instruction. The above machine instruction is written as assembly instruction:

An instruction is stored in the instruction register, but its operation is carried out at arithmetic and logical unit. An arithmetic and logical operation is a hardware component of electronic circuits that perform various functional operations of integer addition, integer multiplication, bitwise logical operations of AND, OR, NOT, and XOR (exclusive or), and shift and rotation operations.

In an ALU, subtraction is calculated as addition of negative numbers, and division is not supported for most CPU's. Additional hardware components of divider and floating-point unit (FPU) are used for effective precision calculation.

The execution of an instruction in a CPU is controlled by the control unit and proceeds in a cycle of four stages: fetch, decode, execute, and restore.

The execution time of an instruction is the total time to complete these four stages. For a CPU which does not support instruction pipeline, the execution of the next instruction can begin only when all the four stages of the current instruction is completed. The following figure illustrates non-pipelined instruction execution:

Most of modern processors support pipelined instruction execution that an instruction can begin its execution as soon as the first stage of its previous instruction is completed. The following figure shows pipelined execution of four-stage instructions. Starting from the fourth time steps, at most four instructions are executed simultaneously.

This section will focus on the discussion of primary storage which is also know as main memory of a computer. In the development of computer technology, primary storage is usually divided into two types: random access memory and read-only memory.

Random access memory (RAM) is a primary storage whose content can be accessed by a computer in any order. A computer can read/write data to/from RAM. RAM is used to store data that are actively used and actively changed. The size of RAM is for modern desktop computers can be up to several hundred mega-bytes. The contents of RAM are erased when a computer is powered off. A program is usually stored in external storage such as a hard disk. When a program is executed, the executable code of the program is first loaded from external storage to RAM and the computer also allocates memory space in RAM for the data elements that will be manipulated by the program. Instructions of the executable code are loaded to instruction registers of the CPU and corresponding operations are carried by arithmetic and logical units and control units. During the program execution time, the contents of allocated data space in RAM are initialized and manipulated. Depending on the program functions, data may loaded and stored from/to external devices.

There are various types of RAM because of technology difference, e.g., static RAM (SRAM), dynamic RAM (DRAM), and non-volatile RAM (NRAM). As the advance of technology, some types of RAM, e.g., flash memory (flash memory is sometimes classified as EEPROM,) can preserve their contents for a long time, i.e., the data will not be erased when a computer is powered off. These types of RAM are used as hard-disk, although they are part of primary memory.

Read-only memory (ROM) is like RAM that its contents can be accessed in any order. But, as indicated by the terminology, the contents of ROM is read only that it can not be changed. ROM is used to store the bootstrapping program of a computer which is executed when a computer is powered on. The contents of read-only memory are retained when a computer is powered off. The size of ROM in a desktop computer is usually small of a few kilo-bytes.

Another type of primary storage is cache memory which is a fast access memory lying between registers and main memory. Usually, the size of a cache memory is only a few kilo-bytes and the contents of a cache memory are duplicated from the main memory. The collection of data elements in a cache memory is a subset of main memory and must be kept coherence with their corresponding data elements in the main memory. We will explain how program instructions and data elements are expressed in high-level programming language such as C in the following sections.

The term identifieris a synonym of name.The study of a programming language are divided into three aspects: syntax, semantics, and pragmatics. Programming language syntax is usually specified using a formal method called Backus-Naur Form (BNF). We will give an informal description of syntax rules. The syntax of identifiers for C programming language is:

Examples are a, id, count, PI, _this_is_an_identifier, abc123def, letters_digits_underscore, and aaa_bbb_ccc. However, 2abcd, it's, and this-is-not-an-identifier are not identifiers. Two remarks about identifiers of C programming language are:

C programming language specifies a set of reserved words which are used for special purpose and cannot be used as an identifier. The list of reserved words is:

Several reserved words have been discussed. Type declarers char, int, short, long, signed, unsigned, float, and double are used for character, integer, and float-point variable declaration. Also, void means null type and sizeof returns the size of a type, return is to return a value from a function. The rest of reserved words will be explained in later sections.

Usually, a C program contains many variables that each is declared as one of many types. All program variables must be declared before they are referenced. In the theory of programming language, a program variable is characterized with six attributes:

Primitive data types are explained in Section 2.6 and more data types will be covered in Section 5.2 and Section 6.3. We will focus on variable attributes name, location, value, scope, and lifetime.

Name, location, value of a program variable is depicted using the following diagram:

In the above diagram, a variable is denoted using its identifier (name), the variable location is represented as a rectangle, and the content of a variable location is indicated by an arrow pointing to a circle meaning a value. A variable appears in the form of its name in a C program, but it may refer to a location or a value. There are rules implying reference of different variable attributes:

Note that, in the case of scanf("%d", &a), &a denotes the location of variable a. Let us consider the following integer declarations:

If variable abc is stored at memory location 0x4008 and variable xyz is stored at memory location 0x400C, they are depicted as the following diagrams:

that show variables named abc and xyz at memory location 0x4008 and 0x400C (we omit the hexadecimal prefix 0x) with value unknown and 10, respectively.

A variable in a C program declared outside all functions is called a global variable. A variable declared in a function (subroutine) is called a local variable to that function. A variable can be declared with the reserved word static in front of the type name. A global variable can be referenced by all functions, i.e., its scope is all functions of the program. A local variable is only accessible by the statements of the function declaring it, i.e., its scope it the function itself. Global variables and static variables are bound to memory locations from the beginning the end of program execution, i.e., their lifetime is the entire program execution time. A static variable will retain its value even when execution of its corresponding function is completed. That is, a static variable is history-sensitive. A non-static local variable is bound to a memory location only when the function declaring this variable is activated, i.e., the lifetime of a non-static local variable is from the time its corresponding function is called to completion time of the function.

Consider the following program. Variable a is a global variable because it is declared outside functions foo and main. Function foo consists of two local variables: a static variable m and a non-static variable n. Function main consists a non-static local variable b. The scope of global variable a is the entire program that it can be accessed by the statements of functions foo and main. However, variables m and n can only be accessed by the statements of foo and variable b is only accessible by the statements of function main.

int a;

void foo (void) {

static int m;

int n;

}

int main (void) {

int b;

}

The lifetime of global variable a and static variable m is the execution time of the entire program. The lifetime of local variable n spans the execution of function foo. Since the execution of function main spans the entire program execution time, local variable b is bound to a memory location during execution of the entire program.

The following program example history_sensitive.c contains two functions foo() and main(). Two integer variables are, one static and another non-static, are declared and their values are output in function foo(). Function foo() is called in function main() three times. Static variable a is initialized to 0 only when function foo() is called at the first time and then its value is preserved. Starting from the second time when foo() is called, the value of static variable a is accumulated. In Line 9, the output value of a is 1, 2, and 3 in three calls of foo(), respectively. Hence, it is a history-sensitive variable. However, the non-static variable b is initialized when every time foo() is called. In Line 9, the output value of b is always 1.

#include <stdio.h>

void foo(void) {
static int a = 0;
int b = 0;

a = a + 1;
b = b + 1;
printf("a: %d, b: %d\n", a, b);
}

int main(void) {
foo();
foo();
foo();
return 0;
}

a: 1, b: 1
a: 2, b: 1
a: 3, b: 1

A variable can be defined as a constant, i.e., its value will stay the same during its lifetime. A constant in C programming language is defined by the reserved word const in front of a type name as the following syntax:

const float PI = 3.14159; // a floating-point constant

const int addr = 0x4040; // a hexadecimal constant of decimal value 16448

const float a = 12.5432e3; // a floating-point constant of value 12543.200000

C programming language allows definition of new types in a program. A type definition starts with the reserved word typedef and followed by a type and an identifier as the following syntax:

Type definition in C is for the purpose of convenience that it does not enforce type checking of user defined types. That is, variables of different user defined type may be assigned to each other. However, some programming languages such as Java's classes are not type compatible. Consider the following C program temperature_c2f.c:

#include <stdio.h>

typedef float centigrade;
typedef float fahrenheit;

int main(void) {

centigrade cDeg;
fahrenheit fDeg;

printf("Please enter a temperature degree in Centigrade: ");
scanf("%f", &cDeg);
fDeg = cDeg * 9.0 / 5.0 + 32.0;
printf("The temperature is %4.2f degrees Fahrenheit.\n", fDeg);

return 0;
}

The program defines two new data types centigrade and fahrenheit with the base type float in Lines 3 and 4. In Lines 8 and 9, variables cDeg and fDeg are declared as centigrade and fahrenheit type, respectively. In C programming language, variables of different user defined types can be accessed in a statement without explicit type conversion if they are defined with the same base types. Hence, Line 13 contains variables of both centigrade and fahrenheit types. We list two samples of execution result below:

Please enter a temperature degree in Centigrade: 37.5
The temperature is 99.50 degrees Fahrenheit.

Please enter a temperature degree in Centigrade: 100
The temperature is 212.00 degrees Fahrenheit.

An expression is a constant, a variable, an arithmetic expression, or a logical expression. Most high-level programming languages support arithmetic operations and logical operations. Arithmetic operations and logical operations can be written as expressions.

In C programming language, the syntax of arithmetic expressions can be specified as the following BNF rules:

Arithmetic Expressions:
infix binary operations	operand1 + operand2 operand1 - operand2 operand1 * operand2 operand1 / operand2 operand1 % operand2
prefix unary operations	++identifer --identifer
postfix unary operations	identifer++ identifer--

For infix binary operations, symbols +, -, *, and / are the addition, subtraction, multiplication, and division operators. They can be applied to operands of any primitive data type. Symbol % is the remainder operator which is applied to integer operands only. Note that char, int, short, long, and their unsigned data types are treated as integers when they are applied to arithmetic operations. The spaces before and after the operators are not required that no spacing or more than one spaces is also allowed. The operands themselves can be other arithmetic expressions. For example, expression a*b+c*d has a*b and c*d as the operands of addition +.

The prefix and postfix unary operations are applied to a variable of primitive data types only. In fact, these prefix and postfix unary operations are assignment statements, not as simple as expressions. Both expressions ++a and a++ are equivalent to assignment a=a+1 and expressions --a and a-- are equivalent to assignment a=a-1. That is, Lines 1 to 3 have the same computational effect and Lines 4 to 6 have the same computational effects.

a = a + 1;

++a;

a++;

a = a - 1;

--a;

a--;

However, when an unary operation appears as an operand in an arithmetic expression, the prefix and postfix operations may cause different side effect. If a prefix operation is an operand of another operation, the increment/decrement operation is performed before the value of the variable is loaded. If a postfix operation is an operand of another operation, the value of the variable is loaded before the increment/decrement operation is performed. Consider program prefix_postfix.c. Both ++a and b++ will set the values of variable a and b to 11. However, c is 16 which is the sum of the new value of variable a and 5; d is 15 which is the sum of the original value of variable b and 5.

#include <stdio.h>

int main(void) {
int a = 10;
int b = 10;
int c, d;

c = ++a + 5;
d = b++ + 5;
printf("a: %d, b: %d\n", a, b);
printf("c: %d, d: %d\n", c, d);

return 0;
}

a: 11, b: 11
c: 16, d: 15

The prefix and postfix operations in C programming language are only abbreviation of assignment statements with special semantics. Whenever you have a doubt of their computational effect, it is a good idea not to use prefix and postfix operations in your program. They can be written as a sequence of assignment statements. The statement of Line 8 in prefix_postfix.c can be rewritten as:

In a programming language, arithmetic operations must be evaluated in a given order to ensure correct computational results. Conventionally, multiplications and divisions are evaluated before addition and subtraction if a sub-expression is not enclosed by parentheses. That is, multiplications are evaluated first before additions in expression a+b*c and a*b+c. Parentheses are required, e.g., (a+b)*c and a*(b+c), to force additions being evaluated before multiplications. Associativity is an issue when a program contains expression such as a-b-c and a/b/c, a%b%c. Left to right associativity will result in (a-b)-c, (a/b)/c, and (a%b)%c; right to left associativity will result in a-(b-c), a/(b/c), and a%(b%c). They will give different computational results. In C programming language, precedence rules and associativity rules of arithmetic operations are specified as below:

Operators	Precedence	Associativity
()	highest	left to right
++, --, unary +, unary -	second highest	right to left
binary *, binary /, binary %	second lowest	left to right
binary +, binary -	lowest	left to right

However, for an expression, such as operand1+operand2, evaluation order of the two operands are not specified in C programming language. It may cause a problem, for example, if the values of variables a and b are 2 and 3 originally, (++a)+a*b has result 12, if the evaluation is from left to right, and it has result 9, if the evaluation is from right to left. We use a simple program evaluation_order.c to test that the evaluation order of Dev-C⁺⁺ is from left to right. Unfortunately, the evaluation order may vary from a compiler to another one. It is important for a programmer to find out the evaluation order of the compiler he/she uses or avoid side-effect in an expression.

#include <stdio.h>

int main(void) {
int a = 2;
int b = 3;
int c;

c = (++a) + a * b;
printf("a: %d, b: %d, c: %d\n", a, b, c);

return 0;
}

a: 3, b: 3, c: 12

To avoid side-effect, Line 8 of program evaluation_order.c may be rewritten to the following sequence of assignments:

To rewrite this assignment to avoid side-effect, a temporary variable temp is used and the statement is rewritten to:

Assignment is the basic statement construct of C programming language. The syntax is assignment statement is specified as below:

The equal sign = in an assignment statement is not an equivalence relation. Its semantics is assigning the result of evaluating expression on the right-hand-side and to the content of location referenced by variable on the left-hand-side of the equal sign. The variable on the left-hand-side of the equal sign of an assignment denotes a memory location and is called an lvalue. Examples of assignments are:

The first example is the assignment of an arithmetic expression. The second example is to computer an average. The third example is to assign 2^m+3ⁿ to variable number, where pow() is a function of C library math.h. According to the precedence and associativity rules, operations are evaluated in a certain order (including loading of operand values and execution of operators). For example, the numerals below the operators and operands indicate the order of evaluation:

C programming language also allows variation of assignment, if the left-hand-side variable is the first operand of an operator. The assignments on the first column is equivalent to the corresponding on on the second column.

Variation of Assignments
variable = variable + expression	variable += expression
variable = variable - expression	variable -= expression
variable = variable * expression	variable = expression*
variable = variable / expression	variable /= expression
variable = variable % expression	variable %= expression

Note that, however, a*=b+c*d is not equivalent to a=a*b+c*d. Parentheses are required to force the expression on the right-hand-side as the second operand of multiplication operator, i.e., it is equivalent to a=a*(b+c*d).

For a strongly typed programming language, the left-hand-side variable and all variables in the right-hand-side must be type compatible, or, have the same type. However, C programming language allows they may be declared as different types and employs an implicit type conversion rule:

When evaluating the right-hand-side expression that an operator has two operands of two different types, a "narrower" operand is converted to a "wider" operand.
The evaluation result of the right-hand-side expression is converted to the type of the left-hand-side variable.

Program type_conversion.c shows implicit type conversion of a C program. In Line 7, variable f is of type float and a is of int. Hence, expression f/a will convert a to float type and performs a floating-point division to obtain quotient 3.08 and then the result is converted to an integer because b is of type int of value 3. In Line 8, a floating-point subtraction is evaluated to yield the result 7.50. However, in Line 9, all operations are floating-point operations that the result is 7.42. It is important to understand how data types are converted when an expression is evaluated. If it is desirable to convert the result of f/a in Line 9 to integer type, explicit casting can be used for this purpose. That is, change Line 9 to h=e-(int)f/a.

#include <stdio.h>

int main(void) {
int a = 10, b;
float e = 10.50, f = 30.80, g, h;

b = f / a;
g = e - b;
h = e - f / a;
printf("b=%d, g=%4.2f, h=%4.2f\n", b, g, h);

return 0;
}

b=3, g=7.50, h=7.42

Program prefix_postfix.c is a situation of using assignments as expressions. In C programming language, a expression is allowed to consist of assignment. Program assignment_as_expression.c contains an assignment (Line 7) whose two operands on the right-hand-side are both assignments with side-effect. Since the evaluation order of the addition operator is from left to right, variable a in a=a*b is the original value of a and variable a in b=a+c is the new value of a after execution of a=a*b.

#include <stdio.h>

int main(void) {
int a = 2, b = 3, c = 5;
int ans;

ans = (a = a * b) + (b = a + c);
printf("a=%d, b=%d, c=%d, ans=%d\n", a, b, c, ans);

return 0;
}

a=6, b=11, c=5, ans=17

It is easy to cause errors in Line 7 of this program because of uncertainty of evaluation order and complicated effect. It would be a good practice for a program not to write programmers with side effect. One solution is to rewrite the assignment of Line 7 to the following sequence of statements:

Finally, we show a program using constant, arithmetic operations, and assignments. Program circle_circumference_area.c reads a radius of a circle, then, computes and outputs the circumference and area of the circle. In Line 6, constant PI is defined as the ratio of the circumference to the diameter of a circle.

#include <stdio.h>

int main(void) {

float radius, circumference, area;
const float PI = 3.14159;

printf("Enter the radius: ");
scanf("%f", &radius);
circumference = 2 * PI * radius;
area = PI * radius * radius;
printf("Radius:\t\t%10.4f\t%10.4E\n", radius, radius);
printf("Circumference:\t%10.4f\t%10.4E\n", circumference, circumference);
printf("Area:\t\t%10.4f\t%10.4E\n", area, area);

return 0;
}

auto	break	case	char	const	continue
default	do	double	else	enum	extern
float	for	goto	if	int	long
register	return	short	signed	sizeof	static
struct	switch	typedef	union	unsigned	void
volatile	while

	a = a * a - b / (c + d * e);
evaluation order	1 3 2 11 4 10 5 9 6 8 7