You are on page 1of 8

Descriptive Statistics

Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are thought to represent. This generally means that descriptive statistics, [2] unlike inferential statistics, are not developed on the basis of probability theory .
[1]

Statistical Inference
In statistics, statistical inference is the process of drawing conclusions from data that are [1] subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation. In other words, it tells us how close we are

Scope of Statistical Inference


For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses: a statistical model of the random process that is supposed to generate the data, and a particular realization of the random process; i.e., a set of data.

The conclusion of a statistical inference is a statistical proposition. Some common forms of statistical proposition are: an estimate; i.e., a particular value that best approximates some parameter of interest, a confidence interval (or set estimate); i.e., an interval constructed from the data in such a way that, under repeated sampling of datasets, such intervals would contain the true parameter value with the probability at the stated confidence level, a credible interval; i.e., a set of values containing, for example, 95% of posterior belief, rejection of a hypothesis
[3]

clustering or classification of data points into groups

Statistical Model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related. In mathematical terms, a statistical model is frequently thought of as a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y.

Statistical model has two components: the variable component and the parameter (which defines the probability distribution)

Random Sample
A sample is a subject chosen from a population for investigation; a random sample is one chosen by a method involving an unpredictable component. Random sampling can also refer to taking a number of independent observations from the same probability distribution, without involving any real population. The sample usually is not a representative of the population from which it was drawn this random variation in the results is termed as sampling error.

Types of Random Samples


A simple random sample is selected so that all samples of the same size have an equal chance of being selected from the entire population. A self-weighting sample, also known as an EPSEM (Equal Probability of Selection Method) sample, is one in which every individual, or object, in the population of interest has an equal opportunity of being selected for the sample. Simple random samples are self-weighting. Stratified sampling involves selecting independent samples from a number of subpopulations, group or strata within the population. Great gains in efficiency are sometimes possible from judicious stratification. Cluster sampling involves selecting the sample units in groups. For example, a sample of telephone calls may be collected at by first taking a collection of telephone lines and collecting all the calls on the sampled lines. The analysis of cluster samples must take into account the intra-cluster correlation which reflects the fact that units in the same cluster are likely to be more similar than two units picked at random

Equivalence class and Equivalence Relation


In mathematics, given a set X and an equivalence relation ~ on X, the equivalence class of an element a in X is the subset of all elements in X which are equivalent to a:

If X is the set of all cars, and ~ is the equivalence relation "has the same color as", then one particular equivalence class consists of all green cars. X / ~ could be naturally identified with the set of all car colors.
The rational numbers can be constructed as the set of equivalence classes of ordered pairs of integers (a,b) with b not zero, where the equivalence relation is defined by (a,b) ~ (c,d) if and only if ad = bc.

Probability Space
In probability theory, a probability space or a probability triple is a mathematical construct that models a real-world process (or "experiment") consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind. One proposes that each time a situation of that kind arises, the set of possible outcomes is the same and the probability levels are also the same. A probability space consists of three parts: 1. A sample space, , which is the set of all possible outcomes. 2. A set of events, where each event is a set containing zero or more outcomes. 3. The assignment of probabilities to the events, that is, a function from events to probability levels. An outcome is the result of a single execution of the model. Since individual outcomes might be of little practical use, more complex events are used to characterize groups of outcomes. The collection of all such events is a -algebra . Finally, there is a need to specify each event's likelihood of happening. This is done using the probability measure function, P.

Once the probability space is established, it is assumed that nature makes its move and selects a single outcome, , from the sample space . All the events in that contain the selected outcome (recall that each event is a subset of ) are said to have occurred. The selection performed by nature is done in such a way that if the experiment were to be repeated an infinite number of times, the relative frequencies of occurrence of each of the events would coincide with the probabilities prescribed by the function P.

Discrete examples Example 1 If the experiment consists of just one flip of a perfect coin, then the outcomes are either heads or tails: = {H, T}. The -algebra = 2 contains 2 = 4 events, namely: {H} heads, {T} tails, {} neither heads nor tails, and {H,T} either heads or tails. So, = {{}, {H}, {T}, {H,T}}. There is a fifty percent chance of tossing heads, and fifty percent for tails. Thus the probability measure in this example is P({}) = 0, P({H}) = 0.5, P({T}) = 0.5, P({H,T}) = 1. Example 2 The fair coin is tossed three times. There are 8 possible outcomes: = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} (here HTH for example means that first time the coin landed heads,

the second time tails, and the last time heads again). The complete information is described by the -algebra = 2 of 28 = 256 events, where each of the events is a subset of . Alice knows the outcome of the second toss only. Thus her incomplete information is described by the partition = A1 A2 = {HHH, HHT, THH, THT} {HTH, HTT, TTH, TTT}, and the corresponding -algebra Alice = {{}, A1, A2, }. Brian knows only the total number of tails. His partition contains four parts: = B0 B1 B2 B3 = {HHH} {HHT, HTH, THH} {TTH, THT, HTT} {TTT}; accordingly, his -algebra Brian contains 24 = 16 events. The two -algebras are incomparable: neither Alice Brian nor Brian Alice; both are sub-algebras of 2. Example 3 If 100 voters are to be drawn randomly from among all voters in California and asked whom they will vote for governor, then the set of all sequences of 100 Californian voters would be the sample space . We assume that sampling without replacement is used: only sequences of 100 different voters are allowed. For simplicity an ordered sample is considered, that is a sequence {Alice, Brian} is different from {Brian, Alice}. We also take for granted that each potential voter knows exactly his future choice, that is he/she doesnt choose randomly. Alice knows only whether or not Arnold Schwarzenegger has received at least 60 votes. Her incomplete information is described by the -algebra Alice that contains: (1) the set of all sequences in where at least 60 people vote for Schwarzenegger; (2) the set of all sequences where fewer than 60 vote for Schwarzenegger; (3) the whole sample space ; and (4) the empty set . Brian knows the exact number of voters who are going to vote for Schwarzenegger. His incomplete information is described by the corresponding partition = B0 B1 B100 (though some of these sets may be empty, depending on the Californian voters) and the algebra Brian consists of 2101 events. In this case Alices -algebra is a subset of Brians: Alice Brian. The Brians -algebra is in turn the subset of the much larger complete information -algebra 2 consisting of 2n(n1)(n99) events, where n is the number of all potential voters in California.

Sequences
In mathematics, a sequence is an ordered list of objects (or events). Like a set, it contains members (also called elements or terms), and the number of terms (possibly infinite) is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence. A sequence is a discrete function.

For example, (C, R, Y) is a sequence of letters that differs from (Y, C, R), as the ordering matters. Sequences can be finite, as in this example, or infinite, such as the sequence of all even positive integers (2, 4, 6,...). Finite sequences are sometimes known as strings or words and infinite sequences as streams. The empty sequence ( ) is included in most notions of sequence, but may be excluded depending on the context.

Below is an infinite sequence of real numbers (in blue). This sequence is neither increasing, nor decreasing, nor convergent, nor Cauchy. It is, however, bounded.

Finite and Infinite Sequences


A more formal definition of a finite sequence with terms in a set S is a function from {1, 2, ..., n} to S for some n > 0. An infinite sequence in S is a function from {1, 2, ... } to S. For example, the sequence of prime numbers (2,3,5,7,11, ) is the function 12, 23, 35, 47, 511, . A sequence of a finite length n is also called an n-tuple. Finite sequences include the empty sequence ( ) that has no elements. A function from all integers into a set is sometimes called a bi-infinite sequence or twoway infinite sequence. An example is the bi-infinite sequence of all even integers ( , -4, -2, 0, 2, 4, 6, 8 ).

Independent random variables


Two random variables and are independent if conveys no information about and conveys no information about . If two variables are independent, when we receive information about one of the two, we do not change our assessment of the probability distribution of the other

Chebyshev's inequality:

In very common language, it means that regardless of the nature of the underlying distribution that defines the random variable (some process in the world), there are guaranteed bounds to what % of observations will lie within K standard deviations of a mean. So you need the following: - Mean of the probability distribution - Standard deviation or variance of the distribution, recall that variance is stdev^2 Let's do a simple example: The length of a knife is on average 5 inches long, with a standard deviation of 1/10th of 1 inch. What % of observations will be between 4.75 and 5.25 inches long? We can approach this several ways, but let us take the most intuitive route: The mean is 5 inches...so we're trying to find how often knives will be within 1/4 inch on either side of the mean. Now, if a knife is 5.25 inches, how many standard deviations is the length off by? Recall that our standard deviation is .1 inches. So, .25 inches/.1inch = 2.5 standard deviations. Note that this means that: mean +/- 2.5 standard deviations = the range of lengths 4.75 to 5.25 inches. Now we can solve the problem:

Let's think about this statement. It says the chance that the absolute difference between the observed length and the mean is greater than k standard deviations (k is arbitrary, it could be anything), is equal to 1/k^2. Now k is the # of standard deviations. So this gives the probability that the observation will be GREATER than k standard deviations. We are trying to find the probability that it's on some interval namely, 4.75 to 5.25 inches. So, if the above is P(A), than the probability we want is 1-P(A). So let's do it:

1/k^2

1-1/2.5^2

1/6.25

1-.16

84% (.84

84%).

So, that's it. Keep in mind this assumes that the MEAN IS KNOWN and the VARIANCE OR STDEV IS KNOWN. If those are not known, you can't use this. And for completeness, what is the probability that the knives observed will be off the 4.75 inch to 5.25 inch interval? Well, it's just 1-P(that they're on the interval) so 1-84% = 16% OR, and I hope you're thinking about this already, 1/2.5^2 = 1/6.25 = 16%. So that odds that they are either ON or OFF the interval, the union of 2 mutually exclusive events is just their sum, 84%+16% = 100%.

Moments
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" (in a particular sense) of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by an ellipsoid. Other moments describe other aspects of a distribution such as how the distribution is skewed from its mean, or peaked. The mathematical concept is closely related to the concept of moment in physics, although moment in physics is often represented somewhat differently. Any distribution can be characterized by a number of features (such as the mean, the variance, the skewness, etc.), and the moments of a function[1] describe the nature of its distribution. The 1st moment is denoted by 1. The first moment of the distribution of the random variable X is the expectation operator, i.e., the population mean (if the first moment exists). In higher orders, the central moments (moments about the mean) are more interesting than the moments about zero. The kth central moment, of a real-valued random variable probability distribution X, with the expected value is:

The first central moment is thus 0. The zero-th central moment, 0 is one. See also central moment. Other moments may also be defined. For example, the n th inverse moment about zero is E(X n) and the n th logarithmic moment about zero is E(ln n(x)).

Variance
The second central moment about the mean is the variance. Its positive square root is the standard deviation .

Normalized moments The normalized nth central moment or standardized moment is the nth central moment divided by n; the normalized nth central moment of x = E((x )n)/n. These normalized central moments are dimensionless quantities, which represent the distribution independently of any linear change of scale. Skewness The third central moment is a measure of the lopsidedness of the distribution; any symmetric distribution will have a third central moment, if defined, of zero. The normalized third central moment is called the skewness, often . A distribution that is skewed to the left (the tail of the distribution is heavier on the left) will have a negative skewness. A distribution that is skewed to the right (the tail of the distribution is heavier on the right), will have a positive skewness. For distributions that are not too different from the normal distribution, the median will be somewhere near /6; the mode about /2.

Kurtosis
The fourth central moment is a measure of whether the distribution is tall and skinny or short and squat, compared to the normal distribution of the same variance. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always non-negative; and except for a point distribution, it is always strictly positive. The fourth central moment of a normal distribution is 34. The kurtosis is defined to be the normalized fourth central moment minus 3. (Equivalently, as in the next section, it is the fourth cumulant divided by the square of the variance.) Some authorities[3][4]do not subtract three, but it is usually more convenient to have the normal distribution at the origin of coordinates. If a distribution has a peak at the mean and long tails, the fourth moment will be high and the kurtosis positive (leptokurtic); and conversely; thus, bounded distributions tend to have low kurtosis (platykurtic). The kurtosis can be positive without limit, but must be greater than or equal to 2 2; equality only holds for binary distributions. For unbounded skew distributions not too far from normal, tends to be somewhere in the area of 2 and 22. The inequality can be proven by considering where T = (X )/. This is the expectation of a square, so it is non-negative whatever a is; on the other hand, it's also a quadratic equation in a. Its discriminant must be non-positive, which gives the required relationship.

You might also like