Applied Statistics

APPLIED STATISTICS
DR. S.L. LAHUDKAR

PROF., DEPT. OF ELECTRO.& TELECOMM.
I.C.O.E.R, PUNE
Ihr Logo
Points to be Covered
What is Statistics?
Need of Statistics
Basic Terminologies
Fundamentals of Statistics
Probability Theory
Probability & Cumulative Distribution Function (PDF & CDF)
Prepared By: Dr. S.L.Lahudkar
Your Logo
Develop a
clear
description
Identify imp.
factors
Propose or
refine a
model
Manipulate
the model
Confirm the
Solutions
Conduct
Expts.
Basic Engineering Model

Your Logo
Conclusions
&
Recommend.
What is Statistics?
Statistics
is the science of collecting, organizing,

presenting, analyzing, and interpreting numerical data to
assist in making more effective decisions.
Two areas of statistics:

Descriptive Statistics: collection, presentation, and
description of sample data.

Inferential Statistics: making decisions and drawing
conclusions about populations
Your Logo
Basic Terminologies
Population: A collection, or set, of individuals or objects
or events whose properties are to be analyzed.

Two kinds of populations: finite or infinite.
Sample: A subset of the population.
Your Logo
Your Logo
Variables
A variable is a characteristic or condition
that can change or take on different

values.
Most research begins with a general
question about the relationship between

two variables for a specific group of
individuals.
Your Logo
Variables (2)
Qualitative, or Attribute, or Categorical,
Variable: A variable that categorizes or describes

an element of a population.
Quantitative, or Numerical, Variable: A variable
that quantifies an element of a population.
Your Logo
Variable
Quantitative
Qualitative
Nominal
Ordinal
Discrete
Classification of Variables
Your Logo
Cont.
Variables(3)
Variables can be classified as discrete or
continuous.
Discrete variables (such as class size) consist of
indivisible categories.
Continuous variables (such as time or weight) are
infinitely divisible into whatever units a researcher may

choose. For example, time can be measured to the
nearest minute, second, half-second, etc.
Your Logo
Measuring Variables
To establish relationships between variables,
researchers must observe the variables and

record their observations. This requires that the
variables be measured.
The process of measuring a variable requires a
set of categories called a scale of

measurement and a process that classifies
each individual into one category.
Your Logo
Measure & Variability

No matter what the response variable: there will
always be variability in the data.

One of the primary objectives of statistics:
measuring and characterizing variability.

Controlling
(or reducing) variability in a

manufacturing process: statistical process
control.
Your Logo
Example: A supplier fills cans of soda marked 12 ounces. How

much soda does each can really contain?
It is very unlikely any one can contains exactly 12 ounces of
soda.
There is variability in any process.
Some cans contain a little more than 12 ounces, and some
cans contain a little less.

On the average, there are 12 ounces in each can.
The supplier hopes there is little variability in the process, that
most cans contain close to 12 ounces of soda.

Your Logo
Correlational Studies
The
goal of a correlational study is to

determine whether there is a relationship
between two variables and to describe the
relationship.
A correlational study simply observes the two
variables as they exist naturally.
Your Logo
Data Collection
First problem a statistician faces: how to obtain
the data.
It is important to obtain good, or representative,
data.
Inferences are made based on statistics
obtained from the data.

Inferences can only be as good as the data.
Your Logo
Process of Data Collection

Define the objectives of the survey or experiment.
Example: Estimate the average life of an electronic component.

Define the variable and population of interest.
Example: Length of time for anesthesia to wear off after surgery.

Defining the data-collection and data-measuring schemes. This
includes sampling procedures, sample size, and the datameasuring device (questionnaire, scale, ruler, etc.).
Determine the appropriate descriptive or inferential data-analysis
techniques.
Your Logo
Methods of Data Collection

Experiment: The investigator controls or modifies the
environment and observes the effect on the variable

under study.
Survey: Data are obtained by sampling some of the
population of interest. The investigator does not modify

the environment.
Census: A 100% survey. Every element of the
population is listed. Seldom used: difficult and timeconsuming to compile, and expensive.
Your Logo
Experiments
The goal of an experiment is to demonstrate a
cause-and-effect relationship between two

variables; that is, to show that changing the
value of one variable causes changes to occur in
a second variable.
Your Logo
Experiments (cont.)
In an experiment, one variable is manipulated
to create treatment conditions. A second

variable is observed and measured to obtain
scores for a group of individuals in each of the
treatment conditions.
In an experiment, the manipulated variable is
called the independent variable and the

observed variable is the dependent variable.
Your Logo
Your Logo
Other Types of Studies

Other
types of research studies, know as nonexperimental or quasi-experimental, are similar to

experiments because they also compare groups of
scores.
These studies do not use a manipulated variable to
differentiate the groups.

Instead, the variable that
differentiates the groups is usually a pre-existing
participant variable (such as male/female) or a time
variable (such as before/after).
Your Logo
Data
The measurements obtained in a research
study are called the data.

The
goal of statistics is to help

researchers organize and interpret the
data.
Your Logo
Descriptive Statistics
Descriptive
statistics are methods

organizing and summarizing data.
for
For example, tables or graphs are used to
organize data, and descriptive values such as

the average score are used to summarize data.
A descriptive value for a population is called a
parameter and a descriptive value for a sample

is called a statistic.
Your Logo
Inferential Statistics
Inferential statistics are methods for using sample
data to make general conclusions (inferences)

about populations.
Because a sample is typically only a part of the
whole population, sample data provide only limited

information about the population. As a result,
sample statistics are generally imperfect
representatives of the corresponding population
parameters.
Your Logo
Sampling
Sampling Frame: A list of the elements
belonging to the population from which the
sample will be drawn.
Sample Design: The process of selecting
sample elements from the sampling frame.
Your Logo
Types of Samples
Judgment Samples: Samples that are selected on the
basis of being typical. Items are selected that are

representative of the population. The validity of the
results from a judgment sample reflects the soundness of
the collectors judgment.
Probability Samples: Samples in which the elements to
be selected are drawn on the basis of probability. Each

element in a population has a certain probability of being
selected as part of the sample.
Your Logo
Types of Samples (2)

Random Samples: A sample selected in such a
way that every element in the population has a

equal probability of being chosen. Equivalently,
all samples of size n have an equal chance of
being selected. Random samples are obtained
either by sampling with replacement from a finite
population or by sampling without replacement
from an infinite population.
Your Logo
Types of Samples (3)

Systematic Sample: A sample in which every kth item
of the sampling frame is selected, starting from the first

element which is randomly selected from the first k
elements.
Stratified Random Sample: A sample obtained by
stratifying the sampling frame and then selecting a fixed

number of items from each of the strata by means of a
simple random sampling technique.
Your Logo
Sampling Error
The discrepancy between a sample statistic and
its population parameter is called sampling

error.
Defining and measuring sampling error is a
large part of inferential statistics.
Your Logo
Notation
The individual measurements or scores obtained for a
research participant will be identified by the letter X (or X

and Y if there are multiple scores for each individual).
The number of scores in a data set will be identified by N
for a population or n for a sample.

Summing a set of values is a common operation in
statistics and has its own notation. The Greek letter

sigma, , will be used to stand for "the sum of." For
example, X identifies the sum of the scores.
Your Logo
Probability & Statistics

Probability: Properties of the population are
assumed known. Answer questions about the
sample based on these properties.
Statistics: Use information in the sample to draw

a conclusion about the population
Your Logo
Probability
The Probability of an outcome is a number
between 0 and 1 that measures the likelihood

that the outcome will occur when the experiment
is performed. (0=impossible, 1=certain).
Probabilities of all sample points must sum to 1
Your Logo
Events
An event is a specific collection of sample
points.
The probability of an event A is calculated
by summing the probabilities of the

outcomes in the sample space for A.
Your Logo
Steps for Calculating Probabilities

Define the experiment.
List the sample points.
Assign the probabilities to sample points.
Determine the collection of sample points contained in
the event of interest.

Sum the sample point probabilities to get the event
probability.
Your Logo
Example
In Craps one rolls two fair dice. What is the probability of the sum of
the two dice showing 7?
Your Logo
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
(2,6)
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
(4,6)
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
(5,6)
(6,1)
(6,2)
(6,3)
(6,4)
(6,5)
(6,6)
(1,6)
(2,5)
(3,4)
(4,3)
(5,2)
(6,1)
Probabilities of occurance
Your Logo
Outcome
So the Probability of 7 when rolling two dice is 1/6
This example illustrates the following rule:
In a Sample Space S of equally likely outcomes. The
probability of the event A is given by

P(A) = #A / #S
That is the number of outcomes in A divided by the total
number of events in S.
Your Logo
Set Theory
AC: The Complement of A is the event that A does
not occur
AB : The Union of two events A and B is the event
that occurs if either A or B or both occur, it consists

of all sample points that belong to A or B or both.
AB: The Intersection of two events A and B is the
event that occurs if both A and B occur, it consists

of all sample points that belong to both A and B
Your Logo
Basic Probability Rule

P(Ac)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Mutually Exclusive Events are events which
cannot occur at the same time.

P(AB)=0 for Mutually Exclusive Events.
Your Logo
Conditional Probability
P(A | B) ~ Probability of A occuring given that B has
occurred.
P(A | B) = P(AB) / P(B)

Multiplicative Rule:
P(AB)
= P(A|B)P(B)
= P(B|A)P(A)
Your Logo
Independent Variables
A and B are independent events if the occurrence of one
event does not affect the probability of the othe event.

If A and B are independent then
P(A|B)=P(A)
P(B|A)=P(B)
P(AB)=P(A)P(B)
Your Logo
Continuous Random Variable

A random variable is continuous if it can take
any value in an interval.

For continuous random variable, we use the
following two concepts to describe the

probability distribution
Probability Density Function (PDF)
Cumulative Distribution Function(CDF)
Your Logo
Probability Density Function (PDF)

Probability Density Function is a similar concept as
the probability distribution function for a discrete

random variable.
You can consider the probability density function as a
smoothed probability distribution function.

Let X be a continuous random variable. Let x denotes
the value that the random variable X takes. We use

f(x) to denote the probability density function.
Your Logo
Binimial probability Distribution function with success prob 0.5

and total # trial 50
0.12
Probability Dinsity Function Example (Normal

Distribution with mean 25 and SD 3.2)
0.12
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0
0
1
10 13 16 19 22 25 28 31 34 37 40 43 46 49
10
15
20
25
30
35
40
45
50
Comparing Probability Distribution Function (for discrete r.v)

and Probability density function (for continuous r.v)
Your Logo
f(x)
Probability Density Function shows the probability

that the random variable falls in a particular range.
Your Logo
Example:
You client told you that he will visit you between noon
and 2pm. Between noon and 2pm, the time he will arrive
at your company is totally random. Let X be the random
variable for the time he arrives (X=1.5 means he visit
your office at 1:30pm)
Let x be the possible value for the random variable X.
Then, the probability density function f(x) has the

following shape.
Your Logo
The probability that your client visits

your office between 12:30 and 1:00
is given by the shaded area.
f(x)
f(x)
0.5
0.5
0.5
f(x)
0.5
Note that area between 0 and 2 should

be equal to 1 since the probability that
your clients arrives between noon and
2pm is 1 (assuming that he will keep his
promise that he will visit between noon
and 2pm)
Your Logo
Properties of PDF
f(x)0 for any x
Total area under f(x) is 1
Your Logo
Cumulative Distribution Function(CDF)

The cumulative distribution function, F(x), for a
continuous random variable X expresses the

probability that X does not exceed the value of
x, as a function of x
F ( x) P( X x)
Your Logo
f(x)
F(x)=P(Xx)
x
Cumulative Distribution Function F(x) is

given by the shaded area.
Your Logo
Properties of CDF
P(a X b) P( X b) P( X a)
F (b) F (a)
Your Logo
Relationship Between PDF & CDF

Let X be a continuous random variable. Then,
there is a following relationship between

probability density function and cumulative
distribution function.
P(a X b) F (b) F (a)
f (u ) du
Your Logo
Variance and Standard Deviation

Variance and standard deviation for a continuous
random variable are defined as:

x
Var( X ) ( x X ) f ( x)d x
2
X
SD( X ) X Var( X )
Your Logo
Normal Distribution
A random variable X is said to be a normal random
variable with mean and variance 2 if X has the

following probability density function.
f ( x)
1
2
( x ) 2 / 2 2
for - x
where e and are physical constants, e =

2.71828. . . and = 3.14159. . .
Your Logo
Probability Dinsity Function Example (Normal

Distribution with mean 25 and SD 3.2)
Binimial probability Distribution function with success prob 0.5

and total # trial 50
0.12
0.12
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0
1
10 13 16 19 22 25 28 31 34 37 40 43 46 49
10
15
20
Normal Distribution
Your Logo
25
30
35
40
45
50
Properties of Normal Distribution

Suppose that the random variable X follows a normal
distribution given by the previous slides. Then:

The mean of the random variable is ,
E( X )
The variance of the random variable is 2
E[( X X ) 2 ] 2
The Following notation means that a random variable has the normal
distribution with mean and variance 2.
X ~ N ( , 2 )
Your Logo
Normal Distribution with the same variance but different mean
-6
-4
-2
0.5
0.4
Case 1
Case 2
0.3
f(x)
f(x)
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Normal Dist with the same mean but different standard deviation
Case 1
Case3
0.2
0.1
0
0
x
-6
-4
-2
1.Changing the mean without changing the standard

deviation causes a shift.
2.Increasing standard deviation makes the shape flatter
(Fat Tail).
Your Logo
Joint CDF
Joint
cumulative distribution functions often have

complex forms, especially when they are not
independent.
X and Y are said to have independent joint uniform
distribution if X and Y has the following distribution

function.
F(x,y)=xy 0x1, 0y1
X and Y are said to have independent joint exponential
distribution if the joint distribution function is given by

F(x,y)=(1-e-x)(1-e-y),
0<x< 0<y<
Your Logo
Properties of Joint CDF

Let X1, X2, . . .Xk be continuous random variables
Their joint cumulative distribution function, F(x1, x2,
. . .xk) defines the probability that simultaneously X1

is less than x1, X2 is less than x2, and so on; that is:
F ( x1 , x2 ,, xk ) P( X 1 x1 X 2 x2 X k xk )
Your Logo
Properties of Joint CDF(2)

The cumulative distribution functions F1(x1), F2(x2), . . .,Fk(xk)
of the individual random variables are called their marginal

distribution functions. For any i, Fi(xi) is the probability that
the random variable Xi does not exceed the specific value xi.
The random variables are independent if and only if:
F ( x1 , x2 ,, xk ) F1 ( x1 ) F2 ( x2 ) Fk ( xk )
or equivalently
f ( x1 , x2 ,, xk ) f1 ( x1 ) f 2 ( x2 ) f k ( xk )
Your Logo
Covariance
Let X and Y be a pair of continuous random variables, with
respective means x and y. The expected value of (x - x)(Y

- y) is called the covariance between X and Y. That is:
Cov( X , Y ) E[( X x )(Y y )]

An alternative but equivalent expression can be derived as:
Cov( X , Y ) E ( XY ) x y
If the random variables X and Y are independent, then the
covariance between them is 0.
Your Logo
Correlation
Let X and Y be jointly distributed random variables. The
correlation between X and Y is:
Corr ( X , Y )
Cov( X , Y )
XY
Classified as:
Auto correlation
Cross correlation
Your Logo
Autocorrelation
Autocorrelation is the correlation of a signal with itself.
Informally, it is the similarity between observations as a

function of the time lag between them.
It is a mathematical tool for finding repeating patterns,
such as the presence of a periodic signal obscured by

noise, or identifying the missing fundamental frequency
in a signal implied by its harmonic frequencies.
Your Logo
Transmitted Signal, x(n)
Reflected Signal,
y(n) = x(n-D) + w(n)
Autocorrelation
Your Logo

Applied Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Statistics

Uploaded by

Copyright:

Available Formats

APPLIED STATISTICS

DR. S.L. LAHUDKAR

Prepared By: Dr. S.L.Lahudkar

Basic Engineering Model

is the science of collecting, organizing,

Two areas of statistics:

description of sample data.

conclusions about populations

Prepared By: Dr. S.L.Lahudkar

or events whose properties are to be analyzed.

Sample: A subset of the population.

Prepared By: Dr. S.L.Lahudkar

Prepared By: Dr. S.L.Lahudkar

that can change or take on different

question about the relationship between

Variable: A variable that categorizes or describes

Quantitative, or Numerical, Variable: A variable

that quantifies an element of a population.

Prepared By: Dr. S.L.Lahudkar

Prepared By: Dr. S.L.Lahudkar

infinitely divisible into whatever units a researcher may

Prepared By: Dr. S.L.Lahudkar

researchers must observe the variables and

set of categories called a scale of

Prepared By: Dr. S.L.Lahudkar

Measure & Variability

always be variability in the data.

measuring and characterizing variability.

(or reducing) variability in a

Prepared By: Dr. S.L.Lahudkar

Example: A supplier fills cans of soda marked 12 ounces. How

Some cans contain a little more than 12 ounces, and some

cans contain a little less.

The supplier hopes there is little variability in the process, that

most cans contain close to 12 ounces of soda.

goal of a correlational study is to

A correlational study simply observes the two

variables as they exist naturally.

Prepared By: Dr. S.L.Lahudkar

obtained from the data.

Process of Data Collection

Example: Estimate the average life of an electronic component.

Example: Length of time for anesthesia to wear off after surgery.

Prepared By: Dr. S.L.Lahudkar

Methods of Data Collection

environment and observes the effect on the variable

population of interest. The investigator does not modify

Prepared By: Dr. S.L.Lahudkar

cause-and-effect relationship between two

Prepared By: Dr. S.L.Lahudkar

to create treatment conditions. A second

called the independent variable and the

Prepared By: Dr. S.L.Lahudkar

Prepared By: Dr. S.L.Lahudkar

Other Types of Studies

types of research studies, know as nonexperimental or quasi-experimental, are similar to

These studies do not use a manipulated variable to

differentiate the groups.

Prepared By: Dr. S.L.Lahudkar

study are called the data.

goal of statistics is to help

Prepared By: Dr. S.L.Lahudkar

statistics are methods