Professional Documents
Culture Documents
Y =
f (X
1
, ..., X
p
),
i.e. the formula for obtaining y
1
from x
1,1
, ..., x
p,1
, ..., y
N
from
x
1,N
, ..., x
p,N
Typically the parameters are estimated to make the dierences
y
1
y
1
, ..., y
N
y
N
as small as possible
(University of Brescia) 25 / 42
Linear regression
Introduction
f typically depends on parameters to be estimated based on a sample
of observations of Y and X
1
, ..., X
p
Y X
1
... X
p
y
1
x
1,1
... x
p,1
.
.
.
.
.
.
.
.
.
.
.
.
y
N
x
1,N
... x
p,N
Estimating the parameters estimates in f yields the empirical model
Y =
f (X
1
, ..., X
p
),
i.e. the formula for obtaining y
1
from x
1,1
, ..., x
p,1
, ..., y
N
from
x
1,N
, ..., x
p,N
Typically the parameters are estimated to make the dierences
y
1
y
1
, ..., y
N
y
N
as small as possible
(University of Brescia) 25 / 42
Linear regression
Introduction
f typically depends on parameters to be estimated based on a sample
of observations of Y and X
1
, ..., X
p
Y X
1
... X
p
y
1
x
1,1
... x
p,1
.
.
.
.
.
.
.
.
.
.
.
.
y
N
x
1,N
... x
p,N
Estimating the parameters estimates in f yields the empirical model
Y =
f (X
1
, ..., X
p
),
i.e. the formula for obtaining y
1
from x
1,1
, ..., x
p,1
, ..., y
N
from
x
1,N
, ..., x
p,N
Typically the parameters are estimated to make the dierences
y
1
y
1
, ..., y
N
y
N
as small as possible
(University of Brescia) 25 / 42
Linear regression
The linear regression model
The linear regression model is specied by the formula
Y =
0
+
p
k=1
k
X
k
+
Each parameter
k
tells how the dependent variable Y modies as X
k
modies by 1 unit, letting all other independent variables xed
Parameters
0
, ...,
p
are typically estimated with the least squares
method to yield the empirical model
Y =
0
+
p
k=1
k
X
k
(University of Brescia) 26 / 42
Linear regression
The linear regression model
The linear regression model is specied by the formula
Y =
0
+
p
k=1
k
X
k
+
Each parameter
k
tells how the dependent variable Y modies as X
k
modies by 1 unit, letting all other independent variables xed
Parameters
0
, ...,
p
are typically estimated with the least squares
method to yield the empirical model
Y =
0
+
p
k=1
k
X
k
(University of Brescia) 26 / 42
Linear regression
The linear regression model
The linear regression model is specied by the formula
Y =
0
+
p
k=1
k
X
k
+
Each parameter
k
tells how the dependent variable Y modies as X
k
modies by 1 unit, letting all other independent variables xed
Parameters
0
, ...,
p
are typically estimated with the least squares
method to yield the empirical model
Y =
0
+
p
k=1
k
X
k
(University of Brescia) 26 / 42
Linear regression
The linear regression model
Consider now the i th linear regression model
Y
i
=
0
+
p
k=1
k
X
k,i
+
i
, i = 1, ..., N,
i.e. the model referred to the i th observation of the variables,
independently of the values y
i
, x
1,i
, ..., x
p,i
, e
i
that will be observed.
Classical assumptions about the linear regression model are
E(
i
) = 0
Var (
i
) =
2
Cov(
i
,
j
), j = 1, ..., N, i = j
X
1,i
, ..., X
p,i
are not random, i.e. they are xed at the observed values
x
1,i
, ..., x
p,i
(University of Brescia) 27 / 42
Linear regression
The linear regression model
Consider now the i th linear regression model
Y
i
=
0
+
p
k=1
k
X
k,i
+
i
, i = 1, ..., N,
i.e. the model referred to the i th observation of the variables,
independently of the values y
i
, x
1,i
, ..., x
p,i
, e
i
that will be observed.
Classical assumptions about the linear regression model are
E(
i
) = 0
Var (
i
) =
2
Cov(
i
,
j
), j = 1, ..., N, i = j
X
1,i
, ..., X
p,i
are not random, i.e. they are xed at the observed values
x
1,i
, ..., x
p,i
(University of Brescia) 27 / 42
Linear regression
The linear regression model
Consider now the i th linear regression model
Y
i
=
0
+
p
k=1
k
X
k,i
+
i
, i = 1, ..., N,
i.e. the model referred to the i th observation of the variables,
independently of the values y
i
, x
1,i
, ..., x
p,i
, e
i
that will be observed.
Classical assumptions about the linear regression model are
E(
i
) = 0
Var (
i
) =
2
Cov(
i
,
j
), j = 1, ..., N, i = j
X
1,i
, ..., X
p,i
are not random, i.e. they are xed at the observed values
x
1,i
, ..., x
p,i
(University of Brescia) 27 / 42
Linear regression
The linear regression model
Consider now the i th linear regression model
Y
i
=
0
+
p
k=1
k
X
k,i
+
i
, i = 1, ..., N,
i.e. the model referred to the i th observation of the variables,
independently of the values y
i
, x
1,i
, ..., x
p,i
, e
i
that will be observed.
Classical assumptions about the linear regression model are
E(
i
) = 0
Var (
i
) =
2
Cov(
i
,
j
), j = 1, ..., N, i = j
X
1,i
, ..., X
p,i
are not random, i.e. they are xed at the observed values
x
1,i
, ..., x
p,i
(University of Brescia) 27 / 42
Linear regression
The linear regression model
Consider now the i th linear regression model
Y
i
=
0
+
p
k=1
k
X
k,i
+
i
, i = 1, ..., N,
i.e. the model referred to the i th observation of the variables,
independently of the values y
i
, x
1,i
, ..., x
p,i
, e
i
that will be observed.
Classical assumptions about the linear regression model are
E(
i
) = 0
Var (
i
) =
2
Cov(
i
,
j
), j = 1, ..., N, i = j
X
1,i
, ..., X
p,i
are not random, i.e. they are xed at the observed values
x
1,i
, ..., x
p,i
(University of Brescia) 27 / 42
Linear regression
The linear regression model
To assess the linear regression model t, many indicators have been
developed. We see the R square indicator, R
2
The more the model will adapt to data the lower the error variability
will be w.r.t. the total variability
The formula is
R
2
= 1
RSS
TSS
,
where RSS is the Residual Sum of Squares and TSS is the Total Sum
of Squares
(University of Brescia) 28 / 42
Linear regression
The linear regression model
To assess the linear regression model t, many indicators have been
developed. We see the R square indicator, R
2
The more the model will adapt to data the lower the error variability
will be w.r.t. the total variability
The formula is
R
2
= 1
RSS
TSS
,
where RSS is the Residual Sum of Squares and TSS is the Total Sum
of Squares
(University of Brescia) 28 / 42
Linear regression
The linear regression model
To assess the linear regression model t, many indicators have been
developed. We see the R square indicator, R
2
The more the model will adapt to data the lower the error variability
will be w.r.t. the total variability
The formula is
R
2
= 1
RSS
TSS
,
where RSS is the Residual Sum of Squares and TSS is the Total Sum
of Squares
(University of Brescia) 28 / 42
Linear regression
The linear regression model
Notice that R
2
[0, 1] and the better the t is the closer to 1 R
2
is
Notice also that R
2
increases as p, the number of independent
variables, increases. Thus, an adjusted R square indicator has been
advanced
R
2
adj
= 1
RSS
Np1
TSS
N1
.
The numbers N p 1 and N 1 are also called Residual Degrees
of Freedom and Total Degrees of Freedom, respectively
(University of Brescia) 29 / 42
Linear regression
The linear regression model
Notice that R
2
[0, 1] and the better the t is the closer to 1 R
2
is
Notice also that R
2
increases as p, the number of independent
variables, increases. Thus, an adjusted R square indicator has been
advanced
R
2
adj
= 1
RSS
Np1
TSS
N1
.
The numbers N p 1 and N 1 are also called Residual Degrees
of Freedom and Total Degrees of Freedom, respectively
(University of Brescia) 29 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Linear regression
The linear regression model at work: Excel Analysis ToolPak regression routine
Consider the data of the "Regression" sheet
In Excel, choose Tools -> Data Analysis -> Regression:
input the ranges of the Y and Xs variables observations. In our case,
the Y variable is the return of stock A and the X variable is the return
of Index SPP
specify the top-left cell where you would like the output to appear. In
our case, it is cell F7
check the Residuals option
Notice the values of cells R Square, Adjusted R Square, the column
SS including RSS and TSS, and the column Coecients including the
estimates of parameters
0
, intercept, and
1
, X Variable 1
The cell Standard Error is the estimate of the standard deviation of
the model residuals; its formula is
2
=
r
RSS
N 2
=
s
var (Residuals) N 1
N 2
(University of Brescia) 30 / 42
Probability
Introduction
Decision-making means choosing between two or more alternatives.
Good decision-making is based on evaluating which alternative has
the best chance of succeeding. When managers refer to the chance of
something occurring, they are using probability in the decision-making
process
Probability is the chance (expressed with a real number p [0, 1])
that something (an event) will happen
The ve Platonic solids and two
trapezohedrons.
(University of Brescia) 31 / 42
Probability
Introduction
Decision-making means choosing between two or more alternatives.
Good decision-making is based on evaluating which alternative has
the best chance of succeeding. When managers refer to the chance of
something occurring, they are using probability in the decision-making
process
Probability is the chance (expressed with a real number p [0, 1])
that something (an event) will happen
The ve Platonic solids and two
trapezohedrons.
(University of Brescia) 31 / 42
Probability
Introduction
Old, unsatisfactory classical probability dened the probability of an
event, given that each of the outcomes of an experiment are equally
likely, as
P(event) =
number of outcomes realizing the event
total number of outcomes
Frequentist denition improves on classical weakness. It proposes to
dene probability of an event as a limit, in particular as the limit of
the ratio between the number of time when the event happens over
the total number of observations:
lim
n
n
A
n
= P (A)
(University of Brescia) 32 / 42
Probability
Introduction
Old, unsatisfactory classical probability dened the probability of an
event, given that each of the outcomes of an experiment are equally
likely, as
P(event) =
number of outcomes realizing the event
total number of outcomes
Frequentist denition improves on classical weakness. It proposes to
dene probability of an event as a limit, in particular as the limit of
the ratio between the number of time when the event happens over
the total number of observations:
lim
n
n
A
n
= P (A)
(University of Brescia) 32 / 42
Probability
Probability distributions
A probability distribution is a way of recording the way probability
distributes over each event linked to an experiment. We will briey
discuss of four probability distributions. They are real probability
distributions because they are linked to events represented by sets of
real numbers
the Binomial distribution, which is a discrete distribution
the Poisson distribution, which is a discrete distribution often used to
count the number of occurrences of some event in a given period of
time
the exponential distribution, which is a continuous distribution used to
measure the length of time needed to perform some activity
the important continuous distribution known as the normal distribution
The cumulative probability distribution of real -dened events is the
probability of events dened by interval (, x], with x R. As
x + we see that P((, x]) 1
(University of Brescia) 33 / 42
Probability
Probability distributions
A probability distribution is a way of recording the way probability
distributes over each event linked to an experiment. We will briey
discuss of four probability distributions. They are real probability
distributions because they are linked to events represented by sets of
real numbers
the Binomial distribution, which is a discrete distribution
the Poisson distribution, which is a discrete distribution often used to
count the number of occurrences of some event in a given period of
time
the exponential distribution, which is a continuous distribution used to
measure the length of time needed to perform some activity
the important continuous distribution known as the normal distribution
The cumulative probability distribution of real -dened events is the
probability of events dened by interval (, x], with x R. As
x + we see that P((, x]) 1
(University of Brescia) 33 / 42
Probability
Probability distributions
A probability distribution is a way of recording the way probability
distributes over each event linked to an experiment. We will briey
discuss of four probability distributions. They are real probability
distributions because they are linked to events represented by sets of
real numbers
the Binomial distribution, which is a discrete distribution
the Poisson distribution, which is a discrete distribution often used to
count the number of occurrences of some event in a given period of
time
the exponential distribution, which is a continuous distribution used to
measure the length of time needed to perform some activity
the important continuous distribution known as the normal distribution
The cumulative probability distribution of real -dened events is the
probability of events dened by interval (, x], with x R. As
x + we see that P((, x]) 1
(University of Brescia) 33 / 42
Probability
Probability distributions
A probability distribution is a way of recording the way probability
distributes over each event linked to an experiment. We will briey
discuss of four probability distributions. They are real probability
distributions because they are linked to events represented by sets of
real numbers
the Binomial distribution, which is a discrete distribution
the Poisson distribution, which is a discrete distribution often used to
count the number of occurrences of some event in a given period of
time
the exponential distribution, which is a continuous distribution used to
measure the length of time needed to perform some activity
the important continuous distribution known as the normal distribution
The cumulative probability distribution of real -dened events is the
probability of events dened by interval (, x], with x R. As
x + we see that P((, x]) 1
(University of Brescia) 33 / 42
Probability
Probability distributions
A probability distribution is a way of recording the way probability
distributes over each event linked to an experiment. We will briey
discuss of four probability distributions. They are real probability
distributions because they are linked to events represented by sets of
real numbers
the Binomial distribution, which is a discrete distribution
the Poisson distribution, which is a discrete distribution often used to
count the number of occurrences of some event in a given period of
time
the exponential distribution, which is a continuous distribution used to
measure the length of time needed to perform some activity
the important continuous distribution known as the normal distribution
The cumulative probability distribution of real -dened events is the
probability of events dened by interval (, x], with x R. As
x + we see that P((, x]) 1
(University of Brescia) 33 / 42
Probability
Probability distributions
A probability distribution is a way of recording the way probability
distributes over each event linked to an experiment. We will briey
discuss of four probability distributions. They are real probability
distributions because they are linked to events represented by sets of
real numbers
the Binomial distribution, which is a discrete distribution
the Poisson distribution, which is a discrete distribution often used to
count the number of occurrences of some event in a given period of
time
the exponential distribution, which is a continuous distribution used to
measure the length of time needed to perform some activity
the important continuous distribution known as the normal distribution
The cumulative probability distribution of real -dened events is the
probability of events dened by interval (, x], with x R. As
x + we see that P((, x]) 1
(University of Brescia) 33 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
A salesman at SellEvryThing Company makes twenty calls per day to
randomly selected homes. The probability of the salesman making a
sale is 0.1, i.e.
P(1) = 0.1,
where 1 indicates the event "a phone call ends with a sale"
The client answering the call of SellEvryThing
Company.
(University of Brescia) 34 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Given that the probability of a successful outcome is p, the binomial
distribution indicates the probability of succeeding q times over n
trials (n _ q):
P(q) =
n
q
p
q
(1 p)
nq
=
n!
q!(n q)!
p
q
(1 p)
nq
Each trial is independent of the others
Each trial is also called Bernoulli trial
Each Bernoulli trial has two outcames: success, or fail
(University of Brescia) 35 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Given that the probability of a successful outcome is p, the binomial
distribution indicates the probability of succeeding q times over n
trials (n _ q):
P(q) =
n
q
p
q
(1 p)
nq
=
n!
q!(n q)!
p
q
(1 p)
nq
Each trial is independent of the others
Each trial is also called Bernoulli trial
Each Bernoulli trial has two outcames: success, or fail
(University of Brescia) 35 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Given that the probability of a successful outcome is p, the binomial
distribution indicates the probability of succeeding q times over n
trials (n _ q):
P(q) =
n
q
p
q
(1 p)
nq
=
n!
q!(n q)!
p
q
(1 p)
nq
Each trial is independent of the others
Each trial is also called Bernoulli trial
Each Bernoulli trial has two outcames: success, or fail
(University of Brescia) 35 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Given that the probability of a successful outcome is p, the binomial
distribution indicates the probability of succeeding q times over n
trials (n _ q):
P(q) =
n
q
p
q
(1 p)
nq
=
n!
q!(n q)!
p
q
(1 p)
nq
Each trial is independent of the others
Each trial is also called Bernoulli trial
Each Bernoulli trial has two outcames: success, or fail
(University of Brescia) 35 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Use Excel binomial distribution function BINOMDIST to nd the
probability of
no sales, i.e. P(0)
four sales, i.e. P(4)
more than four sales, i.e.
20
i =5
P(i ) = P(5) +P(6) + ... +P(20)
= 1 P(0) P(1)
P(2) P(3) P(4) = 1
4
i =0
P(i ) = 1 P((, 4])
four or more sales, i.e.
20
i =4
P(i ) = 1
3
i =0
P(i ) = 1 P((, 3])
(University of Brescia) 36 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Use Excel binomial distribution function BINOMDIST to nd the
probability of
no sales, i.e. P(0)
four sales, i.e. P(4)
more than four sales, i.e.
20
i =5
P(i ) = P(5) +P(6) + ... +P(20)
= 1 P(0) P(1)
P(2) P(3) P(4) = 1
4
i =0
P(i ) = 1 P((, 4])
four or more sales, i.e.
20
i =4
P(i ) = 1
3
i =0
P(i ) = 1 P((, 3])
(University of Brescia) 36 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Use Excel binomial distribution function BINOMDIST to nd the
probability of
no sales, i.e. P(0)
four sales, i.e. P(4)
more than four sales, i.e.
20
i =5
P(i ) = P(5) +P(6) + ... +P(20)
= 1 P(0) P(1)
P(2) P(3) P(4) = 1
4
i =0
P(i ) = 1 P((, 4])
four or more sales, i.e.
20
i =4
P(i ) = 1
3
i =0
P(i ) = 1 P((, 3])
(University of Brescia) 36 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Use Excel binomial distribution function BINOMDIST to nd the
probability of
no sales, i.e. P(0)
four sales, i.e. P(4)
more than four sales, i.e.
20
i =5
P(i ) = P(5) +P(6) + ... +P(20)
= 1 P(0) P(1)
P(2) P(3) P(4) = 1
4
i =0
P(i ) = 1 P((, 4])
four or more sales, i.e.
20
i =4
P(i ) = 1
3
i =0
P(i ) = 1 P((, 3])
(University of Brescia) 36 / 42
Probability
SellEvryThing Company: BINOMDIST, Excel binomial distribution function
Use Excel binomial distribution function BINOMDIST to nd the
probability of
no sales, i.e. P(0)
four sales, i.e. P(4)
more than four sales, i.e.
20
i =5
P(i ) = P(5) +P(6) + ... +P(20)
= 1 P(0) P(1)
P(2) P(3) P(4) = 1
4
i =0
P(i ) = 1 P((, 4])
four or more sales, i.e.
20
i =4
P(i ) = 1
3
i =0
P(i ) = 1 P((, 3])
(University of Brescia) 36 / 42
Inference
Introduction and sampling distribution
Inferential statistics, usually abbreviated to inference, is a process by
which conclusions about the features of a population are
reached on the basis of examining only a part of it. Think to an
opinion poll that is used to predict the voting pattern of a country's
population during an election
A quality-control manager will take a random sample of products and
if it is found that the number of defective items is too high the entire
batch will be rejected. Label a defective item of the sample with 1,
while put a 0 to each complying item. The sample mean of 1s and 0s
is the proportion of defective items
(University of Brescia) 37 / 42
Inference
Introduction and sampling distribution
Inferential statistics, usually abbreviated to inference, is a process by
which conclusions about the features of a population are
reached on the basis of examining only a part of it. Think to an
opinion poll that is used to predict the voting pattern of a country's
population during an election
A quality-control manager will take a random sample of products and
if it is found that the number of defective items is too high the entire
batch will be rejected. Label a defective item of the sample with 1,
while put a 0 to each complying item. The sample mean of 1s and 0s
is the proportion of defective items
(University of Brescia) 37 / 42
Inference
Introduction and sampling distribution
When the mean is calculated from a sample, the observed value,
X,
depends on which sample was extracted (of the many possible
samples that could be chosen).
Two samples from the same population are likely to have dierent
sample means, therefore possibly leading to dierent conclusions
Managers need to understand how sample means are distributed
throughout the population, i.e. the sampling mean distribution.
(University of Brescia) 38 / 42
Inference
Introduction and sampling distribution
When the mean is calculated from a sample, the observed value,
X,
depends on which sample was extracted (of the many possible
samples that could be chosen).
Two samples from the same population are likely to have dierent
sample means, therefore possibly leading to dierent conclusions
Managers need to understand how sample means are distributed
throughout the population, i.e. the sampling mean distribution.
(University of Brescia) 38 / 42
Inference
Introduction and sampling distribution
When the mean is calculated from a sample, the observed value,
X,
depends on which sample was extracted (of the many possible
samples that could be chosen).
Two samples from the same population are likely to have dierent
sample means, therefore possibly leading to dierent conclusions
Managers need to understand how sample means are distributed
throughout the population, i.e. the sampling mean distribution.
(University of Brescia) 38 / 42
Inference
AstroReturns Company: sampling mean distribution
The investment manager of AstroReturns Company has been asked
by a client to formulate a hypothesis about the average return on his
portfolio investment of six stocks at the end of the next year
How stock markets are perceived nowadays more than ever.
(University of Brescia) 39 / 42
Inference
AstroReturns Company: sampling mean distribution
The manager has the following data about the six stocks returns (in
%) realized in the last year
Stock A B C D E F
Return (%) 8 11 3 18 3 5
Let's illustrate the concept of sampling error. The investment
manager is shrewd and he will base his report on the best mean
return of a sample of three stocks from the six available, i.e. stocks
D, B, and A. The hypothesis he formulates to the client is:
next year's return on your portfolio investment of stocks A, B, C, D, E,
and F will be
18 + 11 + 8
3
= 12, 33%
(University of Brescia) 40 / 42
Inference
AstroReturns Company: sampling mean distribution
The manager has the following data about the six stocks returns (in
%) realized in the last year
Stock A B C D E F
Return (%) 8 11 3 18 3 5
Let's illustrate the concept of sampling error. The investment
manager is shrewd and he will base his report on the best mean
return of a sample of three stocks from the six available, i.e. stocks
D, B, and A. The hypothesis he formulates to the client is:
next year's return on your portfolio investment of stocks A, B, C, D, E,
and F will be
18 + 11 + 8
3
= 12, 33%
(University of Brescia) 40 / 42
Inference
AstroReturns Company: sampling mean distribution
However the client is not dumb, because he knows that this is only
one of the possible 20 outcomes. See the following table
Stock sample Mean return (%) Stock sample Mean return (%)
CEF 1.67 ABE 7.33
ACE 2.67 ACD 7.67
ACF 3.33 ABF 8.00
BCE 3.67 BCD 8.67
BCF 4.33 DEF 8.67
ABC 5.33 ADE 9.67
AEF 5.33 ADF 10.33
CDE 6.00 BDE 10.67
BEF 6.33 BDF 11.33
CDF 6.67 ABD 12.33
where the 20 outcomes are calculated from the combinations (without
repetition) of 6 objects of class 3
(University of Brescia) 41 / 42
Inference
AstroReturns Company: sampling mean distribution
To avoid "loosing his fees" for having been too optimistic, the
investment manager should organize the sample means to have a
clearer picture, i.e. he needs to draw the sampling mean distribution
Exercise: use Excel FRQUENCY function and ChartWizard to
represent the frequency distribution of the sample mean by grouping
the data into 5 intervals. Notice that the mean of the sample
mean distribution is equal to the mean of the population
Exercise: repeat the previous exercise on a table of means of samples
of four stocks
Exercise: repeat the previous exercise on a table of means of samples
of ve stocks
Notice that the chart of the sampling mean distribution tends
to become more bell-shaped
(University of Brescia) 42 / 42
Inference
AstroReturns Company: sampling mean distribution
To avoid "loosing his fees" for having been too optimistic, the
investment manager should organize the sample means to have a
clearer picture, i.e. he needs to draw the sampling mean distribution
Exercise: use Excel FRQUENCY function and ChartWizard to
represent the frequency distribution of the sample mean by grouping
the data into 5 intervals. Notice that the mean of the sample
mean distribution is equal to the mean of the population
Exercise: repeat the previous exercise on a table of means of samples
of four stocks
Exercise: repeat the previous exercise on a table of means of samples
of ve stocks
Notice that the chart of the sampling mean distribution tends
to become more bell-shaped
(University of Brescia) 42 / 42
Inference
AstroReturns Company: sampling mean distribution
To avoid "loosing his fees" for having been too optimistic, the
investment manager should organize the sample means to have a
clearer picture, i.e. he needs to draw the sampling mean distribution
Exercise: use Excel FRQUENCY function and ChartWizard to
represent the frequency distribution of the sample mean by grouping
the data into 5 intervals. Notice that the mean of the sample
mean distribution is equal to the mean of the population
Exercise: repeat the previous exercise on a table of means of samples
of four stocks
Exercise: repeat the previous exercise on a table of means of samples
of ve stocks
Notice that the chart of the sampling mean distribution tends
to become more bell-shaped
(University of Brescia) 42 / 42
Inference
AstroReturns Company: sampling mean distribution
To avoid "loosing his fees" for having been too optimistic, the
investment manager should organize the sample means to have a
clearer picture, i.e. he needs to draw the sampling mean distribution
Exercise: use Excel FRQUENCY function and ChartWizard to
represent the frequency distribution of the sample mean by grouping
the data into 5 intervals. Notice that the mean of the sample
mean distribution is equal to the mean of the population
Exercise: repeat the previous exercise on a table of means of samples
of four stocks
Exercise: repeat the previous exercise on a table of means of samples
of ve stocks
Notice that the chart of the sampling mean distribution tends
to become more bell-shaped
(University of Brescia) 42 / 42
Inference
AstroReturns Company: sampling mean distribution
To avoid "loosing his fees" for having been too optimistic, the
investment manager should organize the sample means to have a
clearer picture, i.e. he needs to draw the sampling mean distribution
Exercise: use Excel FRQUENCY function and ChartWizard to
represent the frequency distribution of the sample mean by grouping
the data into 5 intervals. Notice that the mean of the sample
mean distribution is equal to the mean of the population
Exercise: repeat the previous exercise on a table of means of samples
of four stocks
Exercise: repeat the previous exercise on a table of means of samples
of ve stocks
Notice that the chart of the sampling mean distribution tends
to become more bell-shaped
(University of Brescia) 42 / 42