You are on page 1of 10

NPTEL

Course On

STRUCTURAL RELIABILITY
Module # 02 Lecture 1
Course Format: Web

Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati

1. Lecture 01: Basic Statistics

Scatter Diagram, Histogram and Frequency Polygon Observation data samples are presented in form of scattered points which can be independent or dependent of any other random variable. Presentation of the sample data is vitally important as it gives crucial knowledge about its constitutive statistical properties such as correlation, range etc. Generally, a statistical observation sample is represented in scatter diagrams, histograms and frequency polygons. The random variables associated with these observations are discrete. Generally, scatter diagram is presented either in 2D or 3D form by presenting two or three random variables, respectively. Figure 2.1.1 shows typical scatter diagrams for two random variables. Each random variable must have observation data which can be discretely represented across graph. Thus, the statistical data must be relating to simultaneous measurement of the random variables. The scatter diagrams shows nature and relation of the random variables with each other. For example, if two random variables have increasing trend in scatter diagram that means they have positive correlation and vice versa [see Figure 2.1.1 () and ()] whereas if this increasing or decreasing trend is very strict (i.e. nearly following a straight line), one can say that the correlation is either +1 or 1, respectively [see Figure 2.1.1 () and ( )]. At times one can notice that trends in scatter diagram is neither uniformly increasing nor decreasing, these have nearly zero linear correlation [see Figure 2.1.1 ( ) and ()] whereas a strong quadratic correlation exits between the pair of random variables shown in Figure 2.1.1 (). Histograms are representation of grouped frequency distribution of observation data. These are bar like representation of observation data where width of bar is class interval of the data and amplitude or height of bar refers to frequency density of data falling under its associated class (see Figure 2.1.2). The area of each bar represents its class frequency, this is expressed in Eq. 2.1.1. Area of each rectangle = width height = (width of class) (frequency density) = (width of class)
class frequency width of class

2.1.1

= class frequency Course Instructor: Dr. Arunasis Chakraborty 1

Lecture 01: Basic Statistics

()

()

( )

()

()

()

Figure 2.1.1

Scatter diagrams showing different types and degrees of correlation () positive, negative, () zero, zero, + and Histogram

Frequency

Frequency Polygon

Values Figure 2.1.2 Typical example of histogram and frequency polygon

Course Instructor: Dr. Arunasis Chakraborty 2

Lecture 01: Basic Statistics Before plotting histograms one has to form frequency table which contains class and frequency. Choosing number of classes play a very crucial role in formulation of frequency table, in turn histograms. Generally, appropriate number of classes may be chosen by using = 1 + 3.3 log where, is number of observation data or sample size and is number of classes. An alternative to histograms is frequency polygon which formed by joining the mid values of each class as shown in Figure 2.1.2. If the width of class are same than the area under histograms is same as under the frequency polygon. The curve formed by frequency polygon gives an idea of frequency distribution of the data. 2.1.2

Measures of Central Tendency A whole set of observations can be described by a single value. It usually occupies a central position such that some observations are larger and some others are smaller than itself, these are known as measures of central tendencies. There are 3 measures of central tendency mean, median and mode. Mean It is of 3 types arithmetic mean, geometric mean and harmonic mean. The words 'mean' and 'average' only refer to arithmetic mean. In this course only arithmetic mean is discussed. Arithmetic Mean (AM) It is defined as sum of a set of observations divided by size of the set. Consider observations 1 , 2 , , where is number of observations, their AM ( ) is = 1 + 2 + + 1 = 2.1.3

Now, say 1 , 2 , , have frequencies 1 , 2 , , respectively, i.e. 1 occurs 1 times, 2 occurs 2 times and so on, then the sum of all the observations (i.e., 1 + 2 + + ) is 1 + 1 + + 1 + 2 + 2 + + 2 + + + + +
1 2

= 1 1 + 2 2 + + Hence, the arithmetic mean is = 1 1 + 2 2 + + = 1 + 2 + +

2.1.4

2.1.5

Course Instructor: Dr. Arunasis Chakraborty 3

Lecture 01: Basic Statistics This is sometimes referred to as weighted arithmetic mean. Important properties of AM 1. Addition of a set of observations is equal to the product of number of observations and AM. = and = 2.1.6

where = is the total frequency. The first relation in Eq. 2.1.6 implies that the simple sum whereas the second relation implies the weighted sum. 2. For given observations, the sum of deviations from their mean is always 0. = 0, = 0, where =

, and

2.1.7

where =

3. Two variables and , related in such a way that = + , where and are constants, then = + 2.1.8

and vice versa. Relation in Eq. 2.1.8 explains that if each of the observations is added, subtracted, multiplied or divided by a constant than the mean will also follow the same mathematical operation and that too with same constants. 4. Let a group of two observations of size 1 and 2 having means 1 and 2 , then the combined mean ( ) of the composite group of 1 + 2 (= ) observations is given by = 1 1 + 2 2 This can be generalised to any number of groups as = where = 2.1.10 2.1.9

5. The sum of squares of deviations has the smallest value if deviations are taken from their mean or AM.
2

is minimum, when = simple AM

2.1.11

Course Instructor: Dr. Arunasis Chakraborty 4

Lecture 01: Basic Statistics


2

is minimum, when = weighted AM

Median The middle most value when a set of observations are sorted in order of magnitude is called median. It can be calculated from a grouped frequency distribution by using the formula : Median = 1 +
/2

2.1.12

where, 1 is lower bound of the median class, is total frequency, is cumulative frequency corresponding to 1 , is frequency of the median class and is width of the median class. Median is, in a certain sense, the real measure of central tendency because it gives the value of the most central observation. Moreover, it is unaffected by higher or lower bound values, and can be easily calculated from frequency distributions with open-end classes. Mode The value in a set of observations which occurs with the highest frequency is known as mode. This actually, reflects the most often occurring value. It is generally calculated as Mode = 1 +
1
1 + 2

2.1.13

where, 1 is lower bound of the highest frequency class, 1 is difference of frequencies in the highest frequency class and the preceding class, 2 is difference of frequencies in the highest frequency class and the following class, and is common width of classes. Eq. 2.1.3 is applicable only when all classes have the same width. One can note that mode has a peculiarity, i.e., in case of observations occurring with equal frequency, mode does not exist. Relation Between Mean, Median and Mode An interesting approximate empirical relationship between mean, mode and median exist and it can be expressed as Mean Mode 3(Mean Median) Note: this expression only holds fairly for single mode with moderate asymmetry. 2.1.14

Standard Deviation and Variance Variance is defined as arithmetic mean of squared deviation from mean, where the deviation from mean, square deviation from mean and variance are shown below Deviations from mean: 1 1 , 2 2 , , Course Instructor: Dr. Arunasis Chakraborty 5 2.1.15

Lecture 01: Basic Statistics Square-Deviations from mean: 1 1 , 2 2 , , Mean-Square-Deviations from mean:
1 2 2 2

1 1

+ 2 2

+ +

Variance is generally denoted by 2 , further, below expressions for simple series as well as frequency distribution are given. For simple series, 2 =
1 1 2

2.1.16
2

For frequency distribution, 2 =

2.1.17

Standard deviation, is defined as square root of variance. It is evaluated as shown in Eq. 2.1.18. Standard Deviation () =
1

2.1.18

Both, variance and standard deviation are vital tools for representation of a statistical data as it shows dispersion of the data from mean in its domain.

Covariance and Correlation Coefficient Covariance is defined for pair of random variables which is associated or related to each other. It is the average of product of individual deviation from the corresponding means. Eq. 2.1.19 shows covariance Cov , between two correlated random variables and . Cov , = Expanding Eq. 2.1.19, one can get Cov , = 1 2.1.19

2.1.20

Course Instructor: Dr. Arunasis Chakraborty 6

Lecture 01: Basic Statistics Generally, one expresses the correlation of two random variables in terms of coefficient of correlation () which is the ratio of covariance and individual standard deviations of both the random variables. = Cov , 2.1.21

Substituting the values of Cov , , and from Eq. 2.1.19 and 2.1.18 in Eq. 2.1.21, one gets = Expanding Eq. 2.1.22, =
2 2 2 2 2.

2.1.22

2.1.23

As = and = , one can substitute this to the above equation and on simplifying, = 2
2

2.1.24

Percentile Percentile is a value below which a given percentage of observations fall. For example, 99% of the observations will fall under 99 percentile (99 ). As per rank the values of different percentiles can be arranged as 1 < 2 < < 99 .

Regression Regression is an estimation process done for average value of one variable for a specified value of other variable. It is conducted with respect to suitable equations (i.e., regression equations) based on statistical data (combined as well as individual) of the random variables. For simple regression, one can consider linear relationship between the variables. Hence, estimates of (denoted by ) is given by regression equation of on as Course Instructor: Dr. Arunasis Chakraborty 7

Lecture 01: Basic Statistics = 2.1.25

2 where, regression coefficient = Cov , and similarly, regression equation of on is given as Eq. 2.1.26 for estimate of (denoted by )

2.1.26

2 where, regression coefficient = Cov , . Now consider a straight line fit as shown below for better understanding of formulation and calculations related to regression.

= +

2.1.27

where, random variable is independent whereas is dependent of . Hence, in Eq. 2.1.27 one gets coefficients and as unknown terms which are to be evaluated as per regression. Multiplying Eq. 2.1.27 by 1 and , moreover summing up the observations of the random variables, one gets = + = + 2 Considering Eq. 2.1.28, dividing by (number of observations) one gets = + = 2.1.30 2.1.31 2.1.28 2.1.29

thus, unknown coefficient is evaluated in terms of individual mean of both the random variables. Now, multiply Eq. 2.1.28 by and divide Eq. 2.1.29 by = +
2 2

2.1.32 2.1.33

= + Finally, subtracting Eq. 2.1.33 and Eq. 2.1.32 = 2 + =

2.1.34 2.1.35

2 + 2

Course Instructor: Dr. Arunasis Chakraborty 8

Lecture 01: Basic Statistics Dividing Eq. 2.1.35 by 2 cov , = = = 2 2 2 +

2.1.36

thus, another unknown coefficient is evaluated in terms of covariance or coefficient of correlation and variance. Substituting from Eq. 2.1.31 and from Eq. 2.1.36 one gets similar expression as Eq. 2.1.25. = 2.1.37

Course Instructor: Dr. Arunasis Chakraborty 9

You might also like