You are on page 1of 6

Definition of 'Z-Test'

A statistical test used to determine whether two population means are different when the
variances are known and the sample size is large. The test statistic is assumed to have a
normal distribution and nuisance parameters such as standard deviation should be known
in order for an accurate z-test to be performed.

Investopedia explains 'Z-Test'


A one-sample location test, two-sample location test, paired difference test and
maximum likelihood estimate are examples of tests that can be conducted as z-tests. Ztests are closely related to t-tests, but t-tests are best performed when an experiment has a
small sample size. Also, t-tests assume that the standard deviation is unknown, while ztests assume that it is known. If the standard deviation of the population is unknown, the
assumption that the sample variance equals the population variance is made.
A z-test is used for testing the mean of a population versus a standard, or comparing the
means of two populations, with large (n 30) samples whether you know the population
standard deviation or not. It is also used for testing the proportion of some characteristic
versus a standard proportion, or comparing the proportions of two populations.
Example: Comparing the average engineering salaries of men versus women.
Example: Comparing the fraction defectives from 2 production lines.
A t-test is used for testing the mean of one population against a standard or comparing the
means of two populations if you do not know the populations standard deviation and
when you have a limited sample (n < 30). If you know the populations standard
deviation, you may use a z-test.
Example: Measuring the average diameter of shafts from a certain machine when you
have a small sample.
An F-test is used to compare 2 populations variances. The samples can be any size. It is
the basis of ANOVA.
Example: Comparing the variability of bolt diameters from two machines.
Matched pair test is used to compare the means before and after something is done to the
samples. A t-test is often used because the samples are often small. However, a z-test is
used when the samples are large. The variable is the difference between the before and
after measurements.
Example: The average weight of subjects before and after following a diet for 6 weeks.

Z-test Vs T-test
Sometimes, measuring every single piece of item is just not practical. That is why we
developed and use statistical methods to solve problems. The most practical way to do it
is to measure just a sample of the population. Some methods test hypotheses by
comparison. The two of the more known statistical hypothesis test are the T-test and the
Z-test. Let us try to breakdown the two.
A T-test is a statistical hypothesis test. In such test, the test statistic follows a Students Tdistribution if the null hypothesis is true. The T-statistic was introduced by W.S. Gossett
under the pen name Student. The T-test is also referred as the Student T-test. It is
very likely that the T-test is most commonly used Statistical Data Analysis procedure for
hypothesis testing since it is straightforward and easy to use. Additionally, it is flexible
and adaptable to a broad range of circumstances.
There are various T-tests and two most commonly applied tests are the one-sample and
paired-sample T-tests. One-sample T-tests are used to compare a sample mean with the
known population mean. Two-sample T-tests, the other hand, are used to compare either
independent samples or dependent samples.

T-test is best applied, at least in theory, if you have a limited sample size (n < 30) as long
as the variables are approximately normally distributed and the variation of scores in the
two groups is not reliably different. It is also great if you do not know the populations
standard deviation. If the standard deviation is known, then, it would be best to use
another type of statistical test, the Z-test. The Z-test is also applied to compare sample
and population means to know if theres a significant difference between them. Z-tests
always use normal distribution and also ideally applied if the standard deviation is
known. Z-tests are often applied if the certain conditions are met; otherwise, other
statistical tests like T-tests are applied in substitute. Z-tests are often applied in large
samples (n > 30). When T-test is used in large samples, the t-test becomes very similar to
the Z-test. There are fluctuations that may occur in T-tests sample variances that do not
exist in Z-tests. Because of this, there are differences in both test results.
Summary:
1. Z-test is a statistical hypothesis test that follows a normal distribution while T-test
follows a Students T-distribution.
2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is
appropriate when you are handling moderate to large samples (n > 30).
3. T-test is more adaptable than Z-test since Z-test will often require certain conditions to
be reliable. Additionally, T-test has many methods that will suit any need.

4. T-tests are more commonly used than Z-tests.


5. Z-tests are preferred than T-tests when standard deviations are known.

Data types that can be analysed with z-tests


data points should be independent from each other
z-test is preferable when n is greater than 30.
the distributions should be normal if n is low, if however n>30 the distribution of the
data does not have to be normal
the variances of the samples should be the same (F-test)
all individuals must be selected at random from the population
all individuals must have equal chance of being selected
sample sizes should be as equal as possible but some differences are allowed

Data types that can be analysed with t-tests


data sets should be independent from each other except in the case of the paired-sample
t-test
where n<30 the t-tests should be used
the distributions should be normal for the equal and unequal variance t-test (K-S test or
Shapiro-Wilke)
the variances of the samples should be the same (F-test) for the equal variance t-test
all individuals must be selected at random from the population
all individuals must have equal chance of being selected
sample sizes should be as equal as possible but some differences are allowed

F-statistic is a value resulting from a standard statistical test used in ANOVA and
regression analysis to determine if the variances between the means of two populations
are significantly different. For practical purposes, it is important to know that this value
determines the P-value, but the F-statistic number will not actually be used in the
interpretation here.
Significance, or P-value, is the probability that an effect at least as extreme as the
current observation has occurred by chance. Therefore, in these particular examples the
chance that the prevalence of low waz dropped from 38% to 26% in for better roofing
groups and from 40% to 16% in groups with higher education is unlikely to have
occurred by chance. For the roofing example, P or Sig=0.031, 97 of every 100 times this
difference would not occur by chance alone. For the education example, P or Sig= 0.000,
there greater than 99.9% certainty that the difference did not occur by chance. In medical
research, if the P-value is less than or equal to 0.05, meaning that there is no more than a
5%, or a 1 in 20, probability of observing a result as extreme as that observed solely due
to chance, then the association between the exposure and disease is considered
statistically significant.
THE RELATIONSHIP BETWEEN CONFIDENCE INTERVALS AND SIGNIFICANCE
TESTS
A confidence interval is a range of population values with which the sample data are
compatible. A significance test considers the likelihood that the sample data has come
from a particular hypothesised population.
The 95% confidence interval consists of all values less than 1.96 standard errors away
from the sample value, testing against any population value in this interval will lead to p
> 0.05. Testing against values outside the 95% confidence interval (which are more than
1.96 standard errors away) will lead to p-values < 0.05.
Similarly, the 99% confidence interval consists of all values less than 2.58 standard
errors away from the sample value, testing against any hypothesised population value in
this interval will give a p-value > 0.01. Testing against values outside the 99% confidence
interval (which are more than 2.58 standard errors away) will lead to p-values < 0.01. In
general:-

Examples
1) The mean birthweight of 53 CMV infected babies was 3060.75g (standard deviation =
601.03g, standard error = 82.57g).

A 95% confidence interval for the population mean birthweight of CMV infected babies
is therefore given by:
(3060.75 1.96(82.57)g) = (2898.91, 3222.59g)
Similarly, the 99% confidence interval for the mean is:
(3060.75 2.58(82.57)g) = (2847.72,3273.78g)
We are 95% confident that the true mean is somewhere between 2898.91 and 3222.59g,
testing against values outside this range will lead to p-values < 0.05.
We are 99% confident that the true mean is between 2847.72 and 3273.78g (notice that
this is a wider interval, we are more confident that it contains the population mean).
Testing against values within this range will lead to p-values > 0.01.
The test given previously showed that the sample mean was significantly different from
a hypothesised population mean of 3263.57g. The p-value for that test was 0.014 and this
corresponds to the hypothesised population mean of 3263.57g lying outside the 95%
confidence interval but inside the 99%.
2) A sample of 33 boys with recurrent infections have their diastolic blood pressures
measured. Their mean blood pressure is 62.5 mmHg, standard deviation 8.2.
Using the sample standard deviation to estimate the population standard deviation,
samples of size 33 will be distributed with standard error
= 1.43 mmHg
Therefore, a 99% confidence interval for the mean diastolic blood pressure of boys with
recurrent infections is (62.5 2.58(1.43)) = (58.81,66.18mmHg).
We wish to know whether boys with recurrent infections are different from boys in
general who are known to have pressures of on average 58.5mmHg. The null hypothesis
to be tested is that the 33 boys come from a population with a mean dbp of 58.5mmHg.
The observed sample mean is 62.5 - 58.5/ 1.43 = 2.797 standard errors away from the
hypothesised mean of 58.5mmHg. Consulting the table of the normal distribution, we
find 0.002 < p < 0.01. Using the linked spreadsheet we get the exact p-value of 0.005, a
sample with a mean 4mmHg away from the hypothesised value would occur by chance
one time in 200 (5 in 1000).
The 99% confidence interval does not contain the hypothesised mean and p < 0.01 as
expected.

Definition of 'Hypothesis Testing'


A process by which an analyst tests a statistical hypothesis. The methodology employed
by the analyst depends on the nature of the data used, and the goals of the analysis. The
goal is to either accept or reject the null hypothesis.

You might also like