You are on page 1of 66

CS 535 Introduction to Scientific Computing

Lecture 1 : Introduction
subhabrata das assam engineering college

Evaluation Plan

Objectives & Pre-requisites


Objectives : To serve as an introduction to several computational science and engineering techniques and tools, including modeling, simulation, and visualization (rather than the numerical analysis oriented course). To make it useful to engineering and science graduates who are interested in learning more about problem solving using a computational approach. Pre-requisites : Basic knowledge of programming, matrix operations, and calculus is assumed.

Course Contents
Syllabus Contents : Introduction to scientific computing. Representing numbers in a computer: scalar data types; Variables and constants: guidelines for variable names. Assignment statements: mathematical and logical operators; Keyboard input and screen output; Writing a simple, linear program. Conditional statements; arrays and subscripts; loops. File ; plotting; Functions and subroutines. Program design; writing well structured programs; debugging techniques. Scientific applications of computer programs; Introduction to Matlab Solving nonlinear equations; Numerical integration; Data analysis, plotting and smoothing; Simulating simple physical, chemical and/or mathematical systems. Simulation: the simple programming approach to difference equations. Differential Equations

Scientific Computing
Techniques & Tools for modeling, solving, analysis and visualization

Provides the problems to be solved computationally

Provides the formalism for the model representation

Workflow of SC

An Example
Problem Identification:

An Example
Modeling:
Model is an abstract representation of a real world process or phenomenon.

An Example
Modeling:
Model is represented by a mathematical formalism

An Example
Putting it together:

Another Example

Another Example

Another Example

Can we solve it for 2 seconds, given the values:

Computer System Fundamental


Important parts :
CPU:
Collection of components( Registers + control unit + Arithmetic and Logic unit) that manages the activity of the computer. Fetching, decoding and executing instruction from the main memory is done here.

Main Memory:
Stores the program being executed and the data used by the program. Faster but volatile and smaller size in nature.

Secondary Storage:
Capable of storing considerably more information than the main memory permanently. However they are slower compare to the main memory.

I/O devices:
Connected to the bus either directly or through an I/O controller. Data moves through the bus between I/O devices and the main or the secondary memory.

Bus :
Central communication facility of a computer. Data and instructions are move through the bus among the above devices attached to it.

Computer System Fundamental


Important parts :
Registers:
Small storage device with extremely fast access to its content. There are usually several special registers . Some important ones are : Program Counter (PC) : Indicates the next instruction of the program to be executed. Instruction Register (IR) : Contains the instruction currently being executed.

Fetch-decode-execute cycle:
Control units manner of executing instructions: Step 1: Next instruction is fetched from the main memory into the IR. Step 2: The type of operation for the instruction is decoded. Step 3: The data used by the instruction is located and brought into the appropriate registers, and Step 4: The instruction is then executed.

Computer System Fundamental


Important parts :
Byte :
Unit of storage for main memory.

Memory Address:
Each byte in the memory has an address, starting from 0,1,2,. And ending at N-1, if N is the number of bytes in the main memory.

Bit:
Byte are usually 8 bits. Bit means either 0 or 1.

Words:
Collection of one or more consecutive bytes.

Big and little endian :


A big-endian machine orders the bytes in a word from left to right; a little-endian orders them from right to left. In both schemes, the lowest byte number is the address of the word.

Power of 2:
The size of memory is chosen to be power of 2 in computer. Reason being that numbers inside the computer are represented in binary, and if we allocate a fixed number of bits to represent numbers those bits can represent is always a power of 2. Specifically, if we use k bits, we can represent numbers from 0 to 2k-1, a range of 2k different numbers.

Number Representations
We are used to base/radix 10 (decimal)
0, 1, 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 10, 20, 30, , 100 10, , 100, , 1000, , 10 000

17

Decimal
Decimal Example: 1041.20310 :

103 102 101 100


Represents:

10-1 10-2 10-3

1*1000 + 0*100 + 4*10 + 1*1 + 2/10 + 0/100 + 3/1000

18

Number Representations
General radix r number, Nr

dp dp-1 d2 d1 d0 . rp rp-1
Represents:

d1 d2 dq r-1 r-2

r2 r1 r0

r-q

dprp + dp-1rp-1+ + d2 r2 + d1r1 + d0 r0 + d1r--1+ d2r--2+dqr-q

19

Binary
Binary Example: 1001.1012 :

23
Represents:

22

21

20

2-1

2-2

2-3

1*8 + 0*4 + 0*2 + 1*1 + 1/2 + 0/4 + 1/8 = 9.62510

20

Decimal To Binary Conversion


1. 2. quot = number; i = 0; repeat until quot = 0:
1. quot = quot/2; 2. digiti = remainder; 3. i++;

Example:

gives digits from least to most significant bit

33/2 = 16 16/2 = 8 8/2 = 4 4/2 = 2 2/2 = 1 1/2 = 0

remainder 1 remainder 0 remainder 0 remainder 0 remainder 0 remainder 1

(lsb)

(msb)

=> 3310 = 1000012

21

Converting fractional numbers


Convert int and fractional parts separately
Example: i = 0; repeat until N = 1.0 or i = n:
N = FracPart(N); N *= 2; digiti = IntPart(N); i++

0.8125*2 = 1.625 0.625 *2 = 1.250 0.25 *2 = 0.500 0.5 *2 = 1.000

int = 1 int = 1 int = 0 int = 1

(msb)
(lsb)

=> 0.812510 = 0.11012


Caution: Many numbers cannot be represented accurately: 0.310 = 0.0[1001]...2 (bracket repeats, limited by bit size)
22

Octal and Hexadecimal


Base 8 (octal) and base 16 (Hexadecimal) are sometimes used (powers of 2)
Octal uses 8 digits (0-7) 0, 1, 2, 3, 4, 5, 6, 7 Hex uses 16 digits: 0, 1, 2, 3, 4, 5, 6, 7 ,8, 9, A, B, C, D, E, F
23

Octal and Hexadecimal

24

Octal and Hexadecimal


Each octal digit represents 3-bits Each hex digit represents 4-bits
Examples: 1310 = 11012 = (001)(101)2 = (1101)2
25

= 158 = D16

Octal and Hexadecimal


Conversion from decimal same as from binary
divide/multiply by 8 or 16 instead May be easier to convert to binary first
Example: 333/8 = 41 remainder 5 41/8 = 5 remainder 1 5/8 = 0 remainder 5 => 33310 = 5158 (lsd) (msd) Example:

333/16 = 2 remainder D(13) 2/16 = 0 remainder 2


33310 = 2D8

(lsd) (msd)

26

Octal and Hexadecimal: Example


Binary to octal or hexadecimal
group bits into 3 (octal) or 4 (hex) from LS bit pad with leading zeros if required

0100 0110 1101 01112

27

Octal and Hexadecimal: Example


0100 0110 1101 01112
= (000) (100) (011) (011) (010) (111) = 433278 = (0100) (0110) (1101) (0111) = 46D716

Note padding at front of number

28

Octal and Hexadecimal


To convert from hex/octal to binary: reverse procedure
FF16 = (1111)(1111)2 = 1111 11112 3778 = (011)(111)(111)2 =11 111 1112

NOTE: for fractional conversion, move from left to right and pad at right end:
0.110011010112
0.112 = 0.(110)2

= 0. (110) (011) (010) (110) = 0.63268 = 0.68

Convert fractional/integer part separately


29

Unsigned Integer
Follows a natural binary coding. Range of integers that can be represented is from 0 to 2n-1

30

Unigned integers
Follows a natural binary coding. Range of integers that can be represented is from 0 to 2n-1
0 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0
27

= +14
= +142

26

25

24

23

22

21

20

31

Signed integers : signed magnitude


Need to reserve one bit of integer for sign Simplest technique is signed magnitude
MSB = 0: number is positive MSB = 1: number is negative

Magnitude
0 0 0 0 1 1 1 0
sign bit
32

= +14

1 0 0 0 1 1 1 0
26 25 24 23 22 21 20

= 14

Signed integers : signed magnitude


Signed magnitude has problems
more complex circuitry needed to perform addition/subtraction two representations of zero!
+0 (all bits 0) 0 (all bits 0 except MSB)

Need a better system to represent signed integers


33

Signed integers : twos compliment


Almost all computers use the twos complement system MSBs place value is the negative of the normal (unsigned) value

27 26 25 2 4 23 22 21 20
(128) (64) (32) (16) (8) (4) (2) (1)

MSB in 8-bit binary is bit 7


34

Signed integers : twos compliment


Lowest value (8 bits)
27 26 2 5 24 23 22 21 2 0
(128) (64) (32) (16) (8) (4) (2) (1)
This bit pattern no longer represents 128 as it did in an unsigned representation

1
128

= 128

35

Signed integers : twos compliment


Largest value (8 bits)
27 26 2 5 24 23 22 21 2 0
(128) (64) (32) (16) (8) (4) (2) (1)
This bit pattern still represents +127 as it did in an unsigned representation

64 + 32 + 16 + 8 + 4 + 2 + 1

= +127

36

Signed integers : twos compliment


For a n-bit number, the range of numbers is from -2n-1 to +2n-1-1. For example, in 8-bit binary
smallest re-presentable value is 128 largest re-presentable value is +127 All values between are re-presentable No other values are re-presentable

Positive values <= 127 use same 8-bit pattern as unsigned equivalent Negative values use bit patterns from upper half of unsigned integer range
positive range is reduced to allow both positive and negative values to be represented in a single format
37

Signed vs Unsigned integers


Two possible interpretations of 8-bit values
binary code value as signed value as unsigned signed and unsigned values are same for positive numbers

positive numbers have 0 in the MSB

negative numbers have 1 in the MSB

011111112 011111102 011111012 ... 000000102 000000012 000000002 111111112 111111102 ... 100000102 100000012 100000002

+12710 +12610 +12510 ... +210 +110 010 -110 -210 ... -12610 -12710 -12810

signed and unsigned values differ by 256 (28) for negative numbers

12710 12610 12510 ... 210 110 010 25510 25410 ... 13010 12910 12810

38

Signed integers : twos compliment


In n-bit binary
represent x by binary code for 2n x
e.g., represent 2 by code for 28 2 = 254

Called twos complement signed number representation


number is represented as difference (complement) from power of two

39

Signed integers : twos compliment


Properties of 2s complement system
MSB has a negative place value ( of 2n-1) lower bits have normal (+ve) place values positive numbers (including zero)
must have MSB = 0 (worth 0 * 2n-1) zero value is all 0 bits same bit pattern as equivalent unsigned value

negative numbers
must have MSB =1 (worth 1 * 2n-1) min value =10000, max = 11111 = -110
40

Signed integers: to binary


What is +3510 in 8-bit twos complement?
positive, so proceed as for unsigned

2 2 2 2 2 2

= = = = = =

35 17 8 4 2 1 0

rem rem rem rem rem rem

1 1 0 0 0 1

answer: 00100011
read up, padding with 0s to 8 bits stop confirm answer is positive: MSB = 0

41

Signed integers: to binary


What is 3510 in 8-bit twos complement?
represent 35 as 28 35 = 256 35 = 221
221 110 55 27 13 6 3 1 0


42

2 2 2 2 2 2 2 2

= = = = = = = =

rem rem rem rem rem rem rem rem

1 0 1 1 1 0 1 1

answer: 11011101
read up confirm answer is negative: MSB = 1

Signed integers: to binary


What is 3510 in 8-bit twos complement? +3510 = 001000112
start with equivalent positive number flip bits

11011100

+1
11011101

add 1

answer: 11011101
43

confirm result is negative: MSB = 1

Signed integers: from binary


What value does 8-bit twos complement 01100111 have?
27 26 2 5 24 23 22 21 2 0
(32) (16) (8) (4) (2) (1)

MSB = 0, so positive just convert from binary like unsigned

(128) (64)

0
44

0
+

1 = +103

64 + 32

4 + 2 + 1

Signed integers: from binary


What value does 8-bit twos complement 11000100 have?
can convert as normal but with MSB now worth 128
(128) (64)

27 26 2 5 24 23 22 21 2 0
(32) (16) (8) (4) (2) (1)

1
45

0
+

1
4

0 = 60

128+ 64

Signed integers: from binary


What decimal value does 8-bit twos complement 11000100 have?
negate this ve binary to +ve equivalent, convert to decimal, negate decimal result
flip bits convert from binary to get decimal +6010

11000100 00111011 +1

add 1

00111100
finally, negate the decimal value

Answer: 6010

46

Signed addition
Twos complement whats so great?
the addition process is the same for any mix of positive and negative numbers
+ve plus +ve +ve plus ve -ve plus +ve -ve plus -ve

addition is also the same as for unsigned numbers


greatly simplifies the digital logic circuits needed for doing computer arithmetic
47

Real number representation


Real numbers
numbers which are not necessarily integers have an integer part and fractional part

Need a way to represent (or approximate) real numbers using binary

48

Scientific notation
Same idea works in binary
sign: + or
1.01101 21101

radix (base): constant

mantissa: fixed point binary, 1.000... to 1.111...


49

This number equals 1.40625 213 = 1152010

exponent: binary integer

Scientific notation
Advantages
very wide range of representable numbers
limited by range of exponent

similar precision for all values


no wasted bits

Disadvantages
some values still not exactly representable
e.g.,

50

Floating point
Binary representation of numbers using a scientific notation style
IEEE754 standard sign mantissa

0 11011001 00100110000110101100101 exponent


51

Floating point
Sign (S)
one bit 0 number is positive 1 number is negative

0 01111111

10000000000000000000000 10000000000000000000000

+1.5

sign bit
52

1 01111111

1.5

Floating point
Exponent (E)
could be represented using twos complement signed notation However, represented using excess-k notation represented value in exponent field is k more than the intended value k is constant (for C float, k = 127)
exponent field 00000001 (110) exponent field 01111111 (12710) exponent field 11111110 (25410) exponent is 126 exponent is 0 exponent is +127

exponent fields of 000...000 and 111...111 reserved for special meanings all bits 0: denormalized numbers and zero all bits 1: infinity and not-a-number (indeterminate)

53

Floating point
Mantissa (M) is normalized
exponent is chosen such that 110 mantissa < 210

choose this one: mantissa is in range 1.000...2 to 1.111...2


54

110.00000 11.000000 1.1000000 0.1100000 0.0110000

23 22 21 20 21

These are all equally valid representations of the number 0.7510

Floating point
Mantissa(M)
represented as fixed-point value between 1.000...2 and 1.111...2 first bit (before the point) is always 1
dont waste a bit to store with the number just assume it always exists

fixed precision
for C float, 23 bits available
plus 1 implicit bit 24 significant bits

mantissa sometimes called significand The value represented by S,E, & M is (-1)S(2E127)(1.M)
55

Floating point
Example: 1152010 as a C float
= 1.4062510 213
sign = 1 because number is negative
mantissa = 1.4062510 = 1.011012, so store 01101 (skip leading 1), pad on right with 0s

1 10001100 01101000000000000000000
exponent = +1310, so store 13 + 127 (excess) = 14010 = 100011002
56

Exercise
What is represented by
1 10000111 10100000000000000000000

What is represented if the exponent bits are 01111000 instead? What is represented it the exponent bits are 00000000 ? ( note that there is no 1 at the front of the mantissa when the exponent is all zeros)

Exercise
What is the value of?
0 11111111 11111111111111111111111 And 0 00000000 00000000000000000000000

Answer :
(-1)0 * 2255-127*(1.11..1) = 2128*(2-2-23) approximately 2129 or about 6.81*1038 2-127 * 2-23 = 2-150 approximately or about 7*10-46

Floating point
C has floating point types of various sizes 32 bits (float)
1 bit sign, 8 bits exponent, 23 (24) bits mantissa, excess 127

64 bits (double)
1 bit sign, 11 bits exponent, 52 (53) bits mantissa, excess 1023

80 bits (long double)


1 bit sign, 15 bits exponent, 64 bits mantissa, excess 16383

59

Floating point
C does many calculations internally using doubles
most modern computers can operate on doubles as fast as on floats may be less efficient to use float variables, even though smaller in size long double operations may be very slow may have to be implemented by software

Using literal floating point values in C


require decimal point eg: 5 is of type int, use 5.0 if you want floating point optional exponent eg: 5.0e-12 means 5.0 1012 (decimal)

60

Limitations of floating point


Size of exponent is fixed
cannot represent very large numbers for C float, exponent is 8 bits, excess is 127 largest exponent is +127 (25410 (111111102) 127)

11111111 reserved for infinity (and not-a-number, NaN) positive: 1.1111...2 2127 = 3.403...10 1038 negative: 1.1111...2 2127 = 3.403...10 1038

largest representable numbers

overflow occurs if numbers larger than this are produced


rounds up to infinity

solution: use a floating point format with a larger exponent double (11 bits), long double (15 bits)

61

Limitations of floating point


Size of exponent is fixed
cannot represent very small numbers for C float, exponent is 8 bits, excess is 127 smallest exponent is 126 (110 (000000012) 127)

00000000 reserved for zero ( and denormalized numbers where implied bit is 0 and exponent = -126 ) positive: 1.000...2 2126 = 1.175...10 1038 negative: 1.000...2 2126 = 1.175...10 1038

smallest representable (normalized) numbers

underflow occurs if a calculation produces a number smaller than the representable limit
rounds down to zero

solution: use a floating point format with a larger exponent double (11 bits), long double (15 bits)

62

Limitations of floating point


Size of mantissa is fixed
limited precision in representations
C float has 23 (24) bits of mantissa
224 107

C float has (almost) 7 decimal digits of precision

solution: use a floating point format with a larger mantissa


double (53 bits), long double (64 bits)

63

Limitations of floating point


Size of mantissa is fixed
some values cannot be represented exactly
e.g. 1/310 = 0.010101010101010101...2
continuing binary fraction never ends cannot fit in 24 (or 240 (or 24000)) bits

solution: none, same problem occurs in decimal scientific notation


can use higher precision floating point type to improve accuracy if exact representation is needed, use rational numbers
64

Floating point comparison


Determine if a floating point number is less than, equal to, or greater than another floating-point number Comparison is common operation, so need to be able to do it quickly Justification for floating-point representation
sign-exponent-mantissa ordering within word excess-k exponent representation normalized representation

65

Floating point comparison


can use integer compare logic to compare floating point numbers
because of sign-exponent-mantissa order

two floating point numbers are equal iff they have the same bit pattern otherwise one is less than the other
compare sign bits if different, order is then known if same, compare exponent fields if different, order is then known if same, compare mantissas order is then known

66

You might also like