Professional Documents
Culture Documents
Students T-Distribution
Bell shaped.
Shape is defined by df
df is based on sample size.
Symmetrical about its mean.
Less peaked than normal distribution.
Has fatter tails.
More probability in tails i.e., more observations are
away from the centre of the distribution & more
outliers.
Central Limit Theorem (CLT) Point Estimate (PE) Confidence Interval (CI)
For a random sample of size n with; Single (sample) value Estimates
population mean , used to estimate Results in a range of values within
finite variance (population population parameter. which actual parameter value will
variance divided by sample size) fall.
=
, the sampling distribution of
2
PE (reliability factor SE).
sample mean x approaches a = level of significance.
normal probability distribution Estimator: Formula used 1- = degree of confidence.
with mean & variance as n to compute PE.
becomes large.
Desirable properties of
Properties of CLT an estimator
For n 30 sampling distribution
of mean is approx. normal. Unbiased Efficient Consistent
Mean of distribution of all possible Expected value of If var ( ) < var ( ) As n , value of
samples = population mean . estimator equals of the same estimator
parameter e.g., parameter then is approaches
Variance of distribution =
E() = i.e, efficient parameter &
sampling error is than sample error
zero. approaches 0
e.g., As n
&
CLT applies only when
SE 0
sample is random.
Biases
Data Mining Bias Sample Selection Bias Look ahead Bias Time-period Bias
Statistical significance of Systematically excluding Using sample data that Time period over
the pattern is some data from analysis. wasnt available on the which the data is
overestimated because It makes the sample test date. gathered is either
the results were found non-random. too short or too long.
through data mining.