You are on page 1of 64

APPLIED STATISTICS

DR. S.L. LAHUDKAR


PROF., DEPT. OF ELECTRO.& TELECOMM.
I.C.O.E.R, PUNE

Ihr Logo

Points to be Covered
What is Statistics?
Need of Statistics
Basic Terminologies
Fundamentals of Statistics
Probability Theory
Probability & Cumulative Distribution Function (PDF & CDF)

Prepared By: Dr. S.L.Lahudkar

Your Logo

Develop a
clear
description

Identify imp.
factors

Propose or
refine a
model

Manipulate
the model

Confirm the
Solutions

Conduct
Expts.

Basic Engineering Model


Prepared By: Dr. S.L.Lahudkar

Your Logo

Conclusions
&
Recommend.

What is Statistics?
Statistics

is the science of collecting, organizing,


presenting, analyzing, and interpreting numerical data to
assist in making more effective decisions.

Two areas of statistics:


Descriptive Statistics: collection, presentation, and

description of sample data.


Inferential Statistics: making decisions and drawing

conclusions about populations

Prepared By: Dr. S.L.Lahudkar

Your Logo

Basic Terminologies
Population: A collection, or set, of individuals or objects

or events whose properties are to be analyzed.


Two kinds of populations: finite or infinite.

Sample: A subset of the population.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Prepared By: Dr. S.L.Lahudkar

Your Logo

Variables
A variable is a characteristic or condition

that can change or take on different


values.
Most research begins with a general

question about the relationship between


two variables for a specific group of
individuals.
Prepared By: Dr. S.L.Lahudkar

Your Logo

Variables (2)
Qualitative, or Attribute, or Categorical,

Variable: A variable that categorizes or describes


an element of a population.

Quantitative, or Numerical, Variable: A variable

that quantifies an element of a population.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Variable

Quantitative

Qualitative

Nominal

Ordinal

Discrete

Classification of Variables

Prepared By: Dr. S.L.Lahudkar

Your Logo

Cont.

Variables(3)
Variables can be classified as discrete or

continuous.
Discrete variables (such as class size) consist of

indivisible categories.
Continuous variables (such as time or weight) are

infinitely divisible into whatever units a researcher may


choose. For example, time can be measured to the
nearest minute, second, half-second, etc.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Measuring Variables
To establish relationships between variables,

researchers must observe the variables and


record their observations. This requires that the
variables be measured.
The process of measuring a variable requires a

set of categories called a scale of


measurement and a process that classifies
each individual into one category.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Measure & Variability


No matter what the response variable: there will

always be variability in the data.


One of the primary objectives of statistics:

measuring and characterizing variability.


Controlling

(or reducing) variability in a


manufacturing process: statistical process
control.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Example: A supplier fills cans of soda marked 12 ounces. How


much soda does each can really contain?
It is very unlikely any one can contains exactly 12 ounces of

soda.
There is variability in any process.

Some cans contain a little more than 12 ounces, and some

cans contain a little less.


On the average, there are 12 ounces in each can.

The supplier hopes there is little variability in the process, that

most cans contain close to 12 ounces of soda.


Prepared By: Dr. S.L.Lahudkar

Your Logo

Correlational Studies
The

goal of a correlational study is to


determine whether there is a relationship
between two variables and to describe the
relationship.

A correlational study simply observes the two

variables as they exist naturally.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Data Collection
First problem a statistician faces: how to obtain

the data.
It is important to obtain good, or representative,

data.
Inferences are made based on statistics

obtained from the data.


Inferences can only be as good as the data.
Prepared By: Dr. S.L.Lahudkar

Your Logo

Process of Data Collection


Define the objectives of the survey or experiment.

Example: Estimate the average life of an electronic component.


Define the variable and population of interest.

Example: Length of time for anesthesia to wear off after surgery.


Defining the data-collection and data-measuring schemes. This

includes sampling procedures, sample size, and the datameasuring device (questionnaire, scale, ruler, etc.).
Determine the appropriate descriptive or inferential data-analysis

techniques.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Methods of Data Collection


Experiment: The investigator controls or modifies the

environment and observes the effect on the variable


under study.
Survey: Data are obtained by sampling some of the

population of interest. The investigator does not modify


the environment.
Census: A 100% survey. Every element of the

population is listed. Seldom used: difficult and timeconsuming to compile, and expensive.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Experiments
The goal of an experiment is to demonstrate a

cause-and-effect relationship between two


variables; that is, to show that changing the
value of one variable causes changes to occur in
a second variable.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Experiments (cont.)
In an experiment, one variable is manipulated

to create treatment conditions. A second


variable is observed and measured to obtain
scores for a group of individuals in each of the
treatment conditions.
In an experiment, the manipulated variable is

called the independent variable and the


observed variable is the dependent variable.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Prepared By: Dr. S.L.Lahudkar

Your Logo

Other Types of Studies


Other

types of research studies, know as nonexperimental or quasi-experimental, are similar to


experiments because they also compare groups of
scores.

These studies do not use a manipulated variable to

differentiate the groups.


Instead, the variable that
differentiates the groups is usually a pre-existing
participant variable (such as male/female) or a time
variable (such as before/after).

Prepared By: Dr. S.L.Lahudkar

Your Logo

Data
The measurements obtained in a research

study are called the data.


The

goal of statistics is to help


researchers organize and interpret the
data.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Descriptive Statistics
Descriptive

statistics are methods


organizing and summarizing data.

for

For example, tables or graphs are used to

organize data, and descriptive values such as


the average score are used to summarize data.
A descriptive value for a population is called a

parameter and a descriptive value for a sample


is called a statistic.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Inferential Statistics
Inferential statistics are methods for using sample

data to make general conclusions (inferences)


about populations.
Because a sample is typically only a part of the

whole population, sample data provide only limited


information about the population. As a result,
sample statistics are generally imperfect
representatives of the corresponding population
parameters.
Prepared By: Dr. S.L.Lahudkar

Your Logo

Sampling
Sampling Frame: A list of the elements
belonging to the population from which the
sample will be drawn.
Sample Design: The process of selecting
sample elements from the sampling frame.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Types of Samples
Judgment Samples: Samples that are selected on the

basis of being typical. Items are selected that are


representative of the population. The validity of the
results from a judgment sample reflects the soundness of
the collectors judgment.
Probability Samples: Samples in which the elements to

be selected are drawn on the basis of probability. Each


element in a population has a certain probability of being
selected as part of the sample.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Types of Samples (2)


Random Samples: A sample selected in such a

way that every element in the population has a


equal probability of being chosen. Equivalently,
all samples of size n have an equal chance of
being selected. Random samples are obtained
either by sampling with replacement from a finite
population or by sampling without replacement
from an infinite population.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Types of Samples (3)


Systematic Sample: A sample in which every kth item

of the sampling frame is selected, starting from the first


element which is randomly selected from the first k
elements.

Stratified Random Sample: A sample obtained by

stratifying the sampling frame and then selecting a fixed


number of items from each of the strata by means of a
simple random sampling technique.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Sampling Error
The discrepancy between a sample statistic and

its population parameter is called sampling


error.
Defining and measuring sampling error is a

large part of inferential statistics.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Notation
The individual measurements or scores obtained for a

research participant will be identified by the letter X (or X


and Y if there are multiple scores for each individual).
The number of scores in a data set will be identified by N

for a population or n for a sample.


Summing a set of values is a common operation in

statistics and has its own notation. The Greek letter


sigma, , will be used to stand for "the sum of." For
example, X identifies the sum of the scores.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Probability & Statistics


Probability: Properties of the population are
assumed known. Answer questions about the
sample based on these properties.

Statistics: Use information in the sample to draw


a conclusion about the population

Prepared By: Dr. S.L.Lahudkar

Your Logo

Probability
The Probability of an outcome is a number

between 0 and 1 that measures the likelihood


that the outcome will occur when the experiment
is performed. (0=impossible, 1=certain).

Probabilities of all sample points must sum to 1

Prepared By: Dr. S.L.Lahudkar

Your Logo

Events
An event is a specific collection of sample

points.
The probability of an event A is calculated

by summing the probabilities of the


outcomes in the sample space for A.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Steps for Calculating Probabilities


Define the experiment.
List the sample points.
Assign the probabilities to sample points.
Determine the collection of sample points contained in

the event of interest.


Sum the sample point probabilities to get the event

probability.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Example
In Craps one rolls two fair dice. What is the probability of the sum of

the two dice showing 7?

Prepared By: Dr. S.L.Lahudkar

Your Logo

(1,1)

(1,2)

(1,3)

(1,4)

(1,5)

(1,6)

(2,1)

(2,2)

(2,3)

(2,4)

(2,5)

(2,6)

(3,1)

(3,2)

(3,3)

(3,4)

(3,5)

(3,6)

(4,1)

(4,2)

(4,3)

(4,4)

(4,5)

(4,6)

(5,1)

(5,2)

(5,3)

(5,4)

(5,5)

(5,6)

(6,1)

(6,2)

(6,3)

(6,4)

(6,5)

(6,6)

(1,6)
(2,5)
(3,4)
(4,3)
(5,2)
(6,1)

Probabilities of occurance
Prepared By: Dr. S.L.Lahudkar

Your Logo

Outcome
So the Probability of 7 when rolling two dice is 1/6
This example illustrates the following rule:
In a Sample Space S of equally likely outcomes. The

probability of the event A is given by


P(A) = #A / #S
That is the number of outcomes in A divided by the total

number of events in S.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Set Theory
AC: The Complement of A is the event that A does

not occur
AB : The Union of two events A and B is the event

that occurs if either A or B or both occur, it consists


of all sample points that belong to A or B or both.
AB: The Intersection of two events A and B is the

event that occurs if both A and B occur, it consists


of all sample points that belong to both A and B

Prepared By: Dr. S.L.Lahudkar

Your Logo

Basic Probability Rule


P(Ac)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Mutually Exclusive Events are events which

cannot occur at the same time.


P(AB)=0 for Mutually Exclusive Events.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Conditional Probability
P(A | B) ~ Probability of A occuring given that B has

occurred.

P(A | B) = P(AB) / P(B)


Multiplicative Rule:

P(AB)
= P(A|B)P(B)
= P(B|A)P(A)
Prepared By: Dr. S.L.Lahudkar

Your Logo

Independent Variables
A and B are independent events if the occurrence of one

event does not affect the probability of the othe event.


If A and B are independent then

P(A|B)=P(A)

P(B|A)=P(B)
P(AB)=P(A)P(B)

Prepared By: Dr. S.L.Lahudkar

Your Logo

Continuous Random Variable


A random variable is continuous if it can take

any value in an interval.


For continuous random variable, we use the

following two concepts to describe the


probability distribution
Probability Density Function (PDF)
Cumulative Distribution Function(CDF)

Prepared By: Dr. S.L.Lahudkar

Your Logo

Probability Density Function (PDF)


Probability Density Function is a similar concept as

the probability distribution function for a discrete


random variable.
You can consider the probability density function as a

smoothed probability distribution function.


Let X be a continuous random variable. Let x denotes

the value that the random variable X takes. We use


f(x) to denote the probability density function.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Binimial probability Distribution function with success prob 0.5


and total # trial 50
0.12

Probability Dinsity Function Example (Normal


Distribution with mean 25 and SD 3.2)
0.12

0.1

0.1

0.08

0.08

0.06

0.06

0.04

0.04

0.02

0.02
0

0
1

10 13 16 19 22 25 28 31 34 37 40 43 46 49

10

15

20

25

30

35

40

45

50

Comparing Probability Distribution Function (for discrete r.v)


and Probability density function (for continuous r.v)

Prepared By: Dr. S.L.Lahudkar

Your Logo

f(x)

Probability Density Function shows the probability


that the random variable falls in a particular range.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Example:
You client told you that he will visit you between noon

and 2pm. Between noon and 2pm, the time he will arrive
at your company is totally random. Let X be the random
variable for the time he arrives (X=1.5 means he visit
your office at 1:30pm)
Let x be the possible value for the random variable X.

Then, the probability density function f(x) has the


following shape.

Prepared By: Dr. S.L.Lahudkar

Your Logo

The probability that your client visits


your office between 12:30 and 1:00
is given by the shaded area.

f(x)

f(x)

0.5

0.5

0.5

f(x)

0.5

Prepared By: Dr. S.L.Lahudkar

Note that area between 0 and 2 should


be equal to 1 since the probability that
your clients arrives between noon and
2pm is 1 (assuming that he will keep his
promise that he will visit between noon
and 2pm)

Your Logo

Properties of PDF
f(x)0 for any x

Total area under f(x) is 1

Prepared By: Dr. S.L.Lahudkar

Your Logo

Cumulative Distribution Function(CDF)


The cumulative distribution function, F(x), for a

continuous random variable X expresses the


probability that X does not exceed the value of
x, as a function of x

F ( x) P( X x)

Prepared By: Dr. S.L.Lahudkar

Your Logo

f(x)

F(x)=P(Xx)
x

Cumulative Distribution Function F(x) is


given by the shaded area.
Prepared By: Dr. S.L.Lahudkar

Your Logo

Properties of CDF

P(a X b) P( X b) P( X a)

F (b) F (a)

Prepared By: Dr. S.L.Lahudkar

Your Logo

Relationship Between PDF & CDF


Let X be a continuous random variable. Then,

there is a following relationship between


probability density function and cumulative
distribution function.

P(a X b) F (b) F (a)

Prepared By: Dr. S.L.Lahudkar

f (u ) du

Your Logo

Variance and Standard Deviation


Variance and standard deviation for a continuous

random variable are defined as:


x

Var( X ) ( x X ) f ( x)d x
2
X

SD( X ) X Var( X )

Prepared By: Dr. S.L.Lahudkar

Your Logo

Normal Distribution
A random variable X is said to be a normal random

variable with mean and variance 2 if X has the


following probability density function.

f ( x)

1
2

( x ) 2 / 2 2

for - x

where e and are physical constants, e =


2.71828. . . and = 3.14159. . .

Prepared By: Dr. S.L.Lahudkar

Your Logo

Probability Dinsity Function Example (Normal


Distribution with mean 25 and SD 3.2)

Binimial probability Distribution function with success prob 0.5


and total # trial 50
0.12

0.12

0.1

0.1

0.08

0.08

0.06

0.06

0.04

0.04
0.02

0.02

0
1

10 13 16 19 22 25 28 31 34 37 40 43 46 49

10

15

20

Normal Distribution

Prepared By: Dr. S.L.Lahudkar

Your Logo

25

30

35

40

45

50

Properties of Normal Distribution


Suppose that the random variable X follows a normal

distribution given by the previous slides. Then:


The mean of the random variable is ,

E( X )
The variance of the random variable is 2

E[( X X ) 2 ] 2
The Following notation means that a random variable has the normal

distribution with mean and variance 2.

X ~ N ( , 2 )

Prepared By: Dr. S.L.Lahudkar

Your Logo

Normal Distribution with the same variance but different mean

-6

-4

-2

0.5
0.4
Case 1
Case 2

0.3

f(x)

f(x)

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0

Normal Dist with the same mean but different standard deviation

Case 1
Case3

0.2
0.1
0

0
x

-6

-4

-2

1.Changing the mean without changing the standard


deviation causes a shift.
2.Increasing standard deviation makes the shape flatter
(Fat Tail).
Prepared By: Dr. S.L.Lahudkar

Your Logo

Joint CDF
Joint

cumulative distribution functions often have


complex forms, especially when they are not
independent.

X and Y are said to have independent joint uniform

distribution if X and Y has the following distribution


function.
F(x,y)=xy 0x1, 0y1

X and Y are said to have independent joint exponential

distribution if the joint distribution function is given by


F(x,y)=(1-e-x)(1-e-y),
Prepared By: Dr. S.L.Lahudkar

0<x< 0<y<
Your Logo

Properties of Joint CDF


Let X1, X2, . . .Xk be continuous random variables
Their joint cumulative distribution function, F(x1, x2,

. . .xk) defines the probability that simultaneously X1


is less than x1, X2 is less than x2, and so on; that is:

F ( x1 , x2 ,, xk ) P( X 1 x1 X 2 x2 X k xk )

Prepared By: Dr. S.L.Lahudkar

Your Logo

Properties of Joint CDF(2)


The cumulative distribution functions F1(x1), F2(x2), . . .,Fk(xk)

of the individual random variables are called their marginal


distribution functions. For any i, Fi(xi) is the probability that
the random variable Xi does not exceed the specific value xi.
The random variables are independent if and only if:

F ( x1 , x2 ,, xk ) F1 ( x1 ) F2 ( x2 ) Fk ( xk )
or equivalently
f ( x1 , x2 ,, xk ) f1 ( x1 ) f 2 ( x2 ) f k ( xk )
Prepared By: Dr. S.L.Lahudkar

Your Logo

Covariance
Let X and Y be a pair of continuous random variables, with

respective means x and y. The expected value of (x - x)(Y


- y) is called the covariance between X and Y. That is:

Cov( X , Y ) E[( X x )(Y y )]


An alternative but equivalent expression can be derived as:

Cov( X , Y ) E ( XY ) x y
If the random variables X and Y are independent, then the

covariance between them is 0.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Correlation
Let X and Y be jointly distributed random variables. The

correlation between X and Y is:

Corr ( X , Y )

Cov( X , Y )

XY

Classified as:
Auto correlation
Cross correlation

Prepared By: Dr. S.L.Lahudkar

Your Logo

Autocorrelation
Autocorrelation is the correlation of a signal with itself.

Informally, it is the similarity between observations as a


function of the time lag between them.

It is a mathematical tool for finding repeating patterns,

such as the presence of a periodic signal obscured by


noise, or identifying the missing fundamental frequency
in a signal implied by its harmonic frequencies.

Prepared By: Dr. S.L.Lahudkar

Your Logo

Transmitted Signal, x(n)

Reflected Signal,
y(n) = x(n-D) + w(n)

Autocorrelation
Prepared By: Dr. S.L.Lahudkar

Your Logo

You might also like