You are on page 1of 3

It suffers from a number of methodological flaws, the most basic of which is that the sample is not a random sample

and therefore the sampling distributions of any


statistics are unknown.

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety
of academic disciplines, from the natural and social sciences to the humanities, and to government and business.
Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled
in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is
called inferential statistics. Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is
concerned with the theoretical basis of the subject.
In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied. This might be a population of people in a
country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. It may instead be a process observed at various times;
data collected about this kind of "population" constitute what is called a time series.
For practical reasons, rather than compiling data about an entire population, one usually studies a chosen subset of the population, called a sample. Data are
collected about the sample in an observational or experimental setting. The data are then subjected to statistical analysis, which serves two related purposes:
description and inference.
Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors
include the mean and standard deviation. Graphical summarizations include various kinds of charts and graphs.
Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may
take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation), or
modeling of relationships (regression). Other modeling techniques include ANOVA, time series, and data mining.
“… it is only the manipulation of uncertainty that interests us. We are not concerned with the matter that is uncertain. Thus we do not study the mechanism of rain;
only whether it will rain.”
The concept of correlation is particularly noteworthy. Statistical analysis of a data set may reveal that two variables (that is, two properties of the population under
consideration) tend to vary together, as if they are connected. For example, a study of annual income and age of death among people might find that poor people
tend to have shorter lives than affluent people. The two variables are said to be correlated. However, one cannot immediately infer the existence of a causal
relationship between the two variables (see Correlation does not imply causation). The correlated phenomena could be caused by a third, previously unconsidered
phenomenon, called a lurking variable.
If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. A major
problem lies in determining the extent to which the chosen sample is representative. Statistics offers methods to estimate and correct for randomness in the
sample and in the data collection procedure, as well as methods for designing robust experiments in the first place
The fundamental mathematical concept employed in understanding such randomness is probability. Mathematical statistics (also called statistical theory) is the
branch of applied mathematics that uses probability theory and analysis to examine the theoretical basis of statistics.
The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method.
Misuse of statistics can produce subtle but serious errors in description and interpretation — subtle in that even experienced professionals sometimes make such
errors, and serious in that they may affect social policy, medical practice and the reliability of structures such as bridges and nuclear power plants. Even when
statistics is correctly applied, the results can be difficult to interpret for a non-expert. For example, the statistical significance of a trend in the data — which
measures the extent to which the trend could be caused by random variation in the sample — may not agree with one's intuitive sense of its significance. The set
of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as statistical literacy.
According to the classification scheme, in statistics the kinds of descriptive statistics and significance tests that are appropriate depend on the level of
measurement of the variables concerned.
Relation or Operation Mathematical structure
nominal mode equality (=) set
ordinal median order (<) totally ordered set
interval mean, standard deviation subtraction (-) and weighted average affine line
ratio geometric mean, coefficient of variation addition (+) and multiplication (×) field
Nominal measurement
In this type of measurement, names are assigned to objects as labels. This assignment is performed by evaluating, by some procedure, the similarity of the to-be-
measured instance to each of a set of named exemplars or category definitions. The name of the most similar named exemplar or definition in the set is the "value"
assigned by nominal measurement to the given instance. If two instances have the same name associated with them, they belong to the same category, and that
is the only significance that nominal measurements have. Variables that are measured only nominally are also called categorical variables.
Nominal numbers
For practical data processing the names may be numerals, but in that case the numerical value of these numerals is irrelevant, and the concept is now sometimes
referred to as nominal number. The only comparisons that can be made between variable values are equality and inequality. There are no "less than" or "greater
than" relations among the classifying names, nor operations such as addition or subtraction.
In social research, variables measured at a nominal level include gender, marital status, race, religious affiliation, political party affiliation, college major, and
birthplace. Other examples include: geographical location in a country represented by that country's international telephone access code, or the make or model of
a car.
Statistical measures
The only kind of measure of central tendency is the mode; median and mean cannot be defined.
Statistical dispersion may be measured with various indices of qualitative variation, but no notion of standard deviation exists.
Ordinal measurement
In this classification, the numbers assigned to objects represent the rank order (1st, 2nd, 3rd etc.) of the entities measured. The numbers are called ordinals. The
variables are called ordinal variables or rank variables. Comparisons of greater and less can be made, in addition to equality and inequality. However, operations
such as conventional addition and subtraction are still meaningless.
Examples include the Mohs scale of mineral hardness; the results of a horse race, which say only which horses arrived first, second, third, etc. but no time
intervals; and many measurements in psychology and other social sciences, for example attitudes like preference, conservatism or prejudice and social class.
Statistical measures
The central tendency of an ordinally measured variable can be represented by its mode or its median, but mean cannot be defined.
One can define quantiles, notably quartiles and percentiles, together with maximum and minimum, but no new measures of statistical dispersion beyond the
nominal ones can be defined: one cannot define range (statistics) and interquartile range, since one cannot subtract quantities.
Interval measurement
The numbers assigned to objects have all the features of ordinal measurements, and in addition equal differences between measurements represent equivalent
intervals. That is, differences between arbitrary pairs of measurements can be meaningfully compared. Operations such as averaging and subtraction are
therefore meaningful, but addition is not, and a zero point on the scale is arbitrary; negative values can be used. The formal mathematical term is an affine space
(in this case an affine line). Variables measured at the interval level are called interval variables, or sometimes scaled variables, as they have a notion of units of
measurement, though the latter usage is not obvious and is not recommended.
Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly. But ratios of differences
can be expressed; for example, one difference can be twice another.
Examples of interval measures are the year date in many calendars, and temperature in Celsius scale or Fahrenheit scale; temperature in the Kelvin scale is a
ratio measurement, however.
Statistical measures
The central tendency of a variable measured at the interval level can be represented by its mode, its median, or its arithmetic mean.
Statistical dispersion can be measured in most of the usual ways, which just involved differences or averaging, such as range, interquartile range, and standard
deviation.
Since one cannot divide, one cannot define measures that require a ratio, such as studentized range or coefficient of variation.
More subtly, while one can define moments about the origin, only central moments are useful, since the choice of origin is arbitrary and not meaningful. One can
define standardized moments, since ratios of differences are meaningful, but one cannot define coefficient of variation, since the mean is a moment about the
origin, unlike the standard deviation, which is (the square root of) a central moment.
Ratio measurement
The numbers assigned to objects have all the features of interval measurement and also have meaningful ratios between arbitrary pairs of numbers. Operations
such as multiplication and division are therefore meaningful. The zero value on a ratio scale is non-arbitrary. Variables measured at the ratio level are called ratio
variables.
Most physical quantities, such as mass, length or energy are measured on ratio scales; so is temperature measured in kelvins, that is, relative to absolute zero.
Social variables of ratio measure include age, length of residence in a given place, number of organizations belonged to or number of church attendances in a
particular time.
Statistical measures
All statistical measures can be used for a variable measured at the ratio level, as all necessary mathematical operations are defined.
The central tendency of a variable measured at the ratio level can be represented by, in addition to its mode, its median, or its arithmetic mean, also its geometric
mean.
In addition to the measures of statistical dispersion defined for interval variables, such as range and standard deviation, for ratio variables one can also define
measures that require a ratio, such as studentized range or coefficient of variation.
In a ratio variable, unlike in an interval variable, the moments about the origin are meaningful, since the origin is not arbitrary.
"True measurement"
The interval and ratio measurement levels are sometimes collectively called "true measurement", although it has been argued that this usage reflects a lack of
understanding of the uses of ordinal measurement. Only ratio or interval scales can correctly be said to have units of measurement.
In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the MCV or PMCC) (r) is a common measure of the correlation between
two variables X and Y. When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a
sample, it is designated by the letter "r" and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two
variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. A correlation of -1 means that
there is a perfect negative linear relationship between variables. A correlation of 0 means there is no linear relationship between the two variables. Correlations are
rarely if ever 0, 1, or -1. If you get a certain outcome it could indicate whether correlations were negative or positive.
Spearman's rank correlation coefficient, named after Charles Spearman and often denoted by the Greek letter ρ (rho) or as rs, is a non-parametric measure of
correlation – that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions
about the frequency distribution of the variables. Unlike the Pearson product-moment correlation coefficient, Spearman's rank correlation coefficient does not
require the assumption that the relationship between the variables is linear, nor does it require the variables to be measured on interval scales; it can be used for
variables measured at the ordinal level.
However, Spearman's rho does assume that subsequent ranks indicate equi-distant positions on the variable measured. For example, using Spearman's rho for
Likert scales often used in psychology, sociology, biology and related disciplines assumes that the (psychologically) "felt distances" between scale points are the
same for all betweens of the Likert scale used.
A statistical hypothesis test is a method of making statistical decisions from and about experimental data. Null-hypothesis testing just answers the question of "how
well the findings fit the possibility that chance factors alone might be responsible."[1] This is done by asking and answering a hypothetical question. One use is
deciding whether experimental results contain enough information to cast doubt on conventional wisdom.
As an example, consider determining whether a suitcase contains some radioactive material. Placed under a Geiger counter, it produces 10 counts per minute.
The null hypothesis is that no radioactive material is in the suitcase and that all measured counts are due to ambient radioactivity typical of the surrounding air and
harmless objects in a suitcase. We can then calculate how likely it is that the null hypothesis produces 10 counts per minute. If it is likely, for example if the null
hypothesis predicts on average 9 counts per minute and a standard deviation of 1 count per minute, we say that the suitcase is compatible with the null hypothesis
(which does not imply that there is no radioactive material, we just can't determine!); on the other hand, if the null hypothesis predicts for example 1 count per
minute and a standard deviation of 1 count per minute, then the suitcase is not compatible with the null hypothesis and there are likely other factors responsible to
produce the measurements.
The test described here is more fully the null-hypothesis statistical significance test. The null hypothesis is a conjecture that exists solely to be falsified by the
sample. Statistical significance is a possible finding of the test - that the sample is unlikely to have occurred by chance given the truth of the null hypothesis. The
name of the test describes its formulation and its possible outcome. One characteristic of the test is its crisp decision: reject or do not reject (which is not the same
as accept). A calculated value is compared to a threshold.
One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable
consequences. A statistical hypothesis test, or more briefly, hypothesis test, is an algorithm to choose between the alternatives (for or against the hypothesis)
which minimizes certain risks.
This article describes the commonly used frequentist treatment of hypothesis testing. From the Bayesian point of view, it is appropriate to treat hypothesis testing
as a special case of normative decision theory (specifically a model selection problem) and it is possible to accumulate evidence in favor of (or against) a
hypothesis using concepts such as likelihood ratios known as Bayes factors.
There are several preparations we make before we observe the data.
The null hypothesis must be stated in mathematical/statistical terms that make it possible to calculate the probability of possible samples assuming the hypothesis
is correct. For example: The mean response to treatment being tested is equal to the mean response to the placebo in the control group. Both responses have the
normal distribution with this unknown mean and the same known standard deviation ... (value).
A test statistic must be chosen that will summarize the information in the sample that is relevant to the hypothesis. In the example given above, it might be the
numerical difference between the two sample means, m1 − m2.
The distribution of the test statistic is used to calculate the probability sets of possible values (usually an interval or union of intervals). In this example, the
difference between sample means would have a normal distribution with a standard deviation equal to the common standard deviation times the factor where n1
and n2 are the sample sizes. Among all the sets of possible values, we must choose one that we think represents the most extreme evidence against the
hypothesis. That is called the critical region of the test statistic. The probability of the test statistic falling in the critical region when the null hypothesis is correct, is
called the alpha value (or size) of the test.
The probability that a sample falls in the critical region when the parameter is θ, where θ is for the alternative hypothesis, is called the power of the test at θ. The
power function of a critical region is the function that maps θ to the power of θ.
After the data are available, the test statistic is calculated and we determine whether it is inside the critical region.
If the test statistic is inside the critical region, then our conclusion is one of the following:
Reject the null hypothesis. (Therefore the critical region is sometimes called the rejection region, while its complement is the acceptance region.)
An event of probability less than or equal to alpha has occurred.
The researcher has to choose between these logical alternatives. In the example we would say: the observed response to treatment is statistically significant.
If the test statistic is outside the critical region, the only conclusion is that there is not enough evidence to reject the null hypothesis. This is not the same as
evidence in favor of the null hypothesis. That we cannot obtain using these arguments, since lack of evidence against a hypothesis is not evidence for it. On this
basis, statistical research progresses by eliminating error, not by finding the truth.
Simple hypothesis - Any hypothesis which specifies the population distribution completely.
Composite hypothesis - Any hypothesis which does not specify the population distribution completely.
Statistical test- A decision function that takes its values in the set of hypotheses.
Region of acceptance - The set of values for which we fail to reject the null hypothesis.
Region of rejection / Critical region - The set of values of the test statistic for which the null hypothesis is rejected.
Power of a test (1-β) - The test's probability of correctly rejecting the null hypothesis. The complement of the false negative rate
Size / Significance level of a test (α) - For simple hypotheses, this is the test's probability of incorrectly rejecting the null hypothesis. The false positive rate. For
composite hypotheses this is the upper bound of the probability of rejecting the null hypothesis over all cases covered by the null hypothesis.
Most powerful test - For a given size or significance level, the test with the greatest power.
In statistics, a result is called 'statistically significant' if it is unlikely to have occurred by chance. "A statistically significant difference" simply means there is
statistical evidence that there is a difference; it does not mean the difference is necessarily large, important or significant in the common meaning of the word.
The significance level of a test is a traditional frequentist statistical hypothesis testing concept. In simple cases, it is defined as the probability of making a decision
to reject the null hypothesis when the null hypothesis is actually true (a decision known as a Type I error, or "false positive determination"). The decision is often
made using the p-value: if the p-value is less than the significance level, then the null hypothesis is rejected. The smaller the p-value, the more significant the result
is said to be.
In more complicated, but practically important cases, the significance level of a test is a probability such that the probablility of making a decision to reject the null
hypothesis when the null hypothesis is actually true is no more than the stated probability. This allows for those applications where the probability of deciding to
reject may be much smaller than the significance level for some sets of assumptions encompassed within the null hypothesis.
The significance level is usually represented by the Greek symbol, α (alpha). Popular levels of significance are 5%, 1% and 0.1%. If a test of significance gives a
p-value lower than the α-level, the null hypothesis is rejected. Such results are informally referred to as 'statistically significant'. For example, if someone argues
that "there's only one chance in a thousand this could have happened by coincidence," a 0.1% level of statistical significance is being implied. The lower the
significance level, the stronger the evidence.
Different α-levels have different advantages and disadvantages. Smaller α-levels give greater confidence in the determination of significance, but run greater risks
of failing to reject a false null hypothesis (a Type II error, or "false negative determination"), and so have less statistical power. The selection of an α-level inevitably
involves a compromise between significance and power, and consequently between the Type I error and the Type II error.
Fixed significance levels such as those mentioned above may be regarded as useful in exploratory data analyses. However, modern statistical advice is that,
where the outcome of a test is essentially the final outcome of an experiment or other study, the p-value should be quoted explicitly. And, importantly, it should be
quoted whether or not the p-value is judged to be significant. This is to allow maximum information to be transfered from a summary of the study into meta-
analyses.
In statistics, a null hypothesis (H0) is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. When used, the null hypothesis is
presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise — that is, when the researcher has a certain degree of confidence,
usually 95% to 99%, that the data does not support the null hypothesis. It is possible for an experiment to fail to reject the null hypothesis. It is also possible that
both the null hypothesis and the alternate hypothesis are rejected if there are more than those two possibilities.
In scientific and medical applications, the null hypothesis plays a major role in testing the significance of differences in treatment and control groups. The
assumption at the outset of the experiment is that no difference exists between the two groups (for the variable being compared): this is the null hypothesis in this
instance. Other types of null hypothesis may be, for example, that:
values in samples from a given population can be modelled using a certain family of statistical distributions.
the variability of data in different groups is the same, although they may be centred around different values.
The alternative hypothesis (or maintained hypothesis or research hypothesis) and the null hypothesis are the two rival hypotheses whose likelihoods are compared
by a statistical hypothesis test. Usually the alternative hypothesis is the possibility that an observed effect is genuine and the null hypothesis is the rival possibility
that it has resulted from random chance.
The classical (or frequentist) approach is to calculate the probability that the observed effect (or one more extreme) will occur if the null hypothesis is true. If this
value (sometimes called the "p-value") is small then the result is called statistically significant and the null hypothesis is rejected in favour of the alternative
hypothesis. If not, then the null hypothesis is not rejected. Incorrectly rejecting the null hypothesis is a Type I error; incorrectly failing to reject it is a Type II error.
the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of the spread of its values. The standard
deviation is usually denoted with the letter σ (lower case sigma). It is defined as the square root of the variance.
To understand standard deviation, keep in mind that variance is the average of the squared differences between data points and the mean. Variance is tabulated in
units squared. Standard deviation, being the square root of that quantity, therefore measures the spread of data about the mean, measured in the same units as
the data.
Stated more formally, the standard deviation is the root mean square (RMS) deviation of values from their arithmetic mean.
For example, in the population {4, 8}, the mean is 6 and the deviations from mean are {−2, 2}. Those deviations squared are {4, 4} the average of which (the
variance) is 4. Therefore, the standard deviation is 2. In this case 100% of the values in the population are at one standard deviation of the mean.
The standard deviation is the most common measure of statistical dispersion, measuring how widely spread the values in a data set are. If many data points are
close to the mean, then the standard deviation is small; if many data points are far from the mean, then the standard deviation is large. If all the data values are
equal, then the standard deviation is zero.
For a population, the standard deviation can be estimated by a modified standard deviation (s) of a sample. The formulas are given below.
In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the
squared distance of its possible values from the expected value (mean). Whereas the mean is a way to describe the location of a distribution, the variance is a way
to capture its scale or degree of being spread out. The unit of variance is the square of the unit of the original variable. The square root of the variance, called the
standard deviation, has the same units as the original variable and can be easier to interpret for this reason.
The variance of a real-valued random variable is its second central moment, and it also happens to be its second cumulant. Just as some distributions do not have
a mean, some do not have a variance as well. The mean exists whenever the variance exists, but not vice versa.
Type I error, also known as an "error of the first kind", an α error, or a "false positive": the error of rejecting a null hypothesis when it is actually true. Plainly
speaking, it occurs when we are observing a difference when in truth there is none.
A false positive normally means that a test claims something to be positive, when that is not the case. For example, a pregnancy test with a positive result
(indicating that the person taking the test is pregnant) has produced a false positive in the case where the person is not pregnant.
Type II error, also known as an "error of the second kind", a β error, or a "false negative": the error of failing to reject a null hypothesis when the alternative
hypothesis is the true state of nature. In other words, this is the error of failing to observe a difference when in truth there is one. This type of error can only occur
when the statistician fails to reject the null hypothesis. In the example of a pregnancy test, a type II error occurs if the test reports false when the person is, in fact,
pregnant.
Type I errors (the "false positive"): the error of rejecting the null hypothesis given that it is actually true; e.g., A court finding a person guilty of a crime that they did
not actually commit.
Type II errors (the "false negative"): the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., A court finding a
person not guilty of a crime that they did actually commit. The Z-test is a statistical test used in inference which determines if the difference between a sample
mean and the population mean is large enough to be statistically significant, that is, if it is unlikely to have occurred by chance.
The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample of test takers are within or outside of the standard
performance of test takers.
In order for the Z-test to be reliable, certain conditions must be met. The most important is that since the Z-test uses the population mean and population standard
deviation, these must be known. The sample must be a simple random sample of the population. If the sample came from a different sampling method, a different
formula must be used. It must also be known that the population varies normally (i.e., the sampling distribution of the probabilities of possible values fits a standard
normal curve). If it is not known that the population varies normally, it suffices to have a sufficiently large sample, generally agreed to be ≥ 30 or 40.
In statistics, a standard score is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference
by the population standard deviation. This conversion process is called standardizing or normalizing.
Standard scores are also called or z-values, z-scores, normal scores, and standardised variables.
The standard score indicates how many standard deviations an observation is above or below the mean. It allows comparison of observations from different
normal distributions, which is done frequently in research.

You might also like