New Microsoft Office Word Document

c
c c
c c
In probability theory, a a a
, or a
, is a way of
assigning a value to each possible outcome, that is element of a sample space.
These values might represent the possible outcomes of an experiment, or the
potential values of a quantity whose value is uncertain (e.g., as a result of
incomplete information or imprecise measurements.) Intuitively, a random variable
can be thought of as a quantity whose value is not fixed, but which can take on
different values; normally, a probability distribution is used to describe the
probability of different values occurring. Realizations of a random variable are
called random variates.
Random variables are usually real-valued, but one can consider arbitrary types
such as boolean values, complex numbers, vectors, matrices, sequences, trees, sets,
shapes, manifolds, functions, and processes. The term a is used to
encompass all such related concepts. A related concept is the stochastic process, a
set of indexed random variables (typically indexed by time or space).
?a
Real-valued random variables (those whose range is the real numbers) are used in
the sciences to make predictions based on data obtained from scientific
experiments. In addition to scientific applications, random variables were
developed for the analysis of games of chance and stochastic events. In such
instances, the function that maps the outcome to a real number is often the identity
function or similarly trivial function, and not explicitly described. In many cases,
however, it is useful to consider random variables that are functions of other
random variables, and then the mapping function included in the definition of a
random variable becomes important. As an example, the square of a random
variable distributed according to a standard normal distribution is itself a random
variable, with a chi-square distribution. One way to think of this is to imagine
generating a large number of samples from a standard normal distribution,
squaring each one, and plotting a histogram of the values observed. With enough
samples, the graph of the histogram will approximate the density function of a chi-
square distribution with one degree of freedom.
Another example is the sample mean, which is the average of a number of samples.
When these samples are independent observations of the same random event they
can be called independent identically distributed random variables. Since each
sample is a random variable, the sample mean is a function of random variables
and hence a random variable itself, whose distribution can be computed and
properties determined.
One of the reasons that real-valued random variables are so commonly considered
is that the expected value (a type of average) and variance (a measure of the
"spread", or extent to which the values are dispersed) of the variable can be
computed.
There are two types of random variables: discrete and continuous.[1] A discrete
random variable maps outcomes to values of a countable set (e.g., the integers),
with each value in the range having probability greater than or equal to zero. A
continuous random variable maps outcomes to values of an uncountable set (e.g.,
the real numbers). For a continuous random variable, the probability of any
specific value is zero, whereas the probability of some infinite set of values (such
as an interval of non-zero length) may be positive. A random variable can be
"mixed", with part of its probability spread out over an interval like a typical
continuous variable, and part of it concentrated on particular values like a discrete
variable. These classifications are equivalent to the categorization of probability
distributions.
The expected value of random vectors, random matrices, and similar aggregates of
fixed structure is defined as the aggregation of the expected value computed over
each individual element. The concept of "variance of a random vector" is normally
expressed through a covariance matrix. No generally-agreed-upon definition of
expected value or variance exists for cases other than just discussed.
There are two possible outcomes for a coin toss: heads, or tails. The possible
outcomes for one fair coin toss can be described using the following random
variable:
and if the coin is equally likely to land on either side then it has a probability mass
function given by:
We can also introduce a real-valued random variable V as follows:
A random variable can also be used to describe the process of rolling a die and the
possible outcomes. The most obvious representation is to take the set {1, 2, 3, 4, 5,
6} as the sample space, defining the random variable X equal to the number rolled.
In this case,
An example of a continuous random variable would be one based on a spinner that

can choose a horizontal direction. Then the values taken by the random variable
are directions. We could represent these directions by North West, East South East,
etc. However, it is commonly more convenient to map the sample space to a
random variable which takes values which are real numbers. This can be done, for
example, by mapping a direction to a bearing in degrees clockwise from North.
The random variable then takes values which are real numbers from the interval [0,
360), with all parts of the range being "equally likely". In this case, Õ = the angle
spun. Any real number has probability zero of being selected, but a positive
probability can be assigned to any a of values. For example, the probability of
choosing a number in [0, 180] is ½. Instead of speaking of a probability mass
function, we say that the probability
of Õ is 1/360. The probability of a
subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360.
In general, the probability of a set for a given continuous random variable can be
calculated by integrating the density over the given set.
An example of a random variable of mixed type would be based on an experiment
where a coin is flipped and the spinner is spun only if the result of the coin toss is
heads. If the result is tails, Õ = í1; otherwise Õ = the value of the spinner as in the
preceding example. There is a probability of ½ that this random variable will have
the value í1. Other ranges of values would have half the probability of the last
example.
Ña

met (ȍ, , ) be a probability space, and ( , Î) a measurable space. Then an

Î¦
a a
is a function Õ: ȍĺ , which is (, Î)-
measurable. That is, such function that for every subset V ü Î, its preimage lies
in : Õ í1(V) ü , where Õ í1(V) = { : Õ( ) ü V}.[2] This definition
ensures that a probability distribution will be easy to define since we can measure
any element V in the target space by looking at its preimage, which we ensure to be
measurable.
When is a topological space, then the most common choice for the ı-algebra Î is
to take it equal to the Borel ı-algebra ( ), which is the ı-algebra generated by the
collection of all open sets in . In such case the ( , Î)-valued random variable is
called the
a a
. Moreover, when space is the real line X,
then such real-valued random variable is called simply the a a
.

a a

In this case the observation space is the real numbers with a suitable measure.
Recall, is the probability space. For real observation space, the function
is a real-valued random variable if
This definition is a special case of the above because generates

the Borel sigma-algebra on the real numbers, and it is enough to check
measurability on a generating set. (Here we are using the fact that
.)
a a a

Associating a cumulative distribution function (CDF) with a random variable is a

generalization of assigning a value to a variable. If the CDF is a (right continuous)
Heaviside step function then the variable takes on the value at the jump with
probability 1. In general, the CDF specifies the probability that the variable takes
on particular values.
If a random variable defined on the probability space is

given, we can ask questions like "How likely is it that the value of Õ is bigger than
2?". This is the same as the probability of the event which is often
written as for short, and easily obtained since
Recording all these probabilities of output ranges of a real-valued random variable

Õ yields the probability distribution of Õ. The probability distribution "forgets"
about the particular probability space used to define Õ and only records the
probabilities of various values of Õ. Such a probability distribution can always be
captured by its cumulative distribution function
and sometimes also using a probability density function. In measure-theoretic

terms, we use the random variable Õ to "push-forward" the measure on ȍ to a
measure d on . The underlying probability space ȍ is a technical device used to
guarantee the existence of random variables, and sometimes to construct them. In
practice, one often disposes of the space ȍ altogether and just puts a measure on
that assigns measure 1 to the whole real line, i.e., one works with probability
distributions instead of random variables.

The probability distribution of a random variable is often characterised by a small

number of parameters, which also have a practical interpretation. For example, it is
often enough to know what its "average value" is. This is captured by the
mathematical concept of expected value of a random variable, denoted E[Õ], and
also called the a
In general, E[î(Õ)] is not equal to î(E[Õ]). Once the
"average value" is known, one could then ask how far from this average value the
values of Õ typically are, a question that is answered by the variance and standard
deviation of a random variable. E[Õ] can be viewed intuitively as an average
obtained from an infinite population, the members of which are particular
evaluations of Õ.
Mathematically, this is known as the (generalised) problem of moments: for a
given class of random variables Õ, find a collection {î} of functions such that the
expectation values E[î(Õ)] fully characterise the distribution of the random
variable Õ.
Ñ a a

If we have a random variable on and a Borel measurable function

, then will also be a random variable on , since the composition of
measurable functions is also measurable. (However, this is not true if is mebesgue
measurable.) The same procedure that allowed one to go from a probability space
to can be used to obtain the distribution of . The cumulative
distribution function of is
If function is invertible, i.e.

a
a
a

and, again with the same hypotheses of invertibility of , assuming also

differentiability, we can find the relation between the probability density functions
by differentiating both sides with respect to , in order to obtain
If there is no invertibility of but each admits at most a countable number of

roots (i.e. a finite, or countably infinite, number of such that ) then the
previous relation between the probability density functions can be generalized with
where . The formulas for densities do not demand to be increasing.

met Õ be a real-valued, continuous random variable and let V = Õ2.

If < 0, then P(Õ2 ) = 0, so
If 0, then
so

Suppose is a random variable with a cumulative distribution
where is a fixed parameter. Consider the random variable Then,
The last expression can be calculated in terms of the cumulative distribution of Õ,

so

a a

There are several different senses in which random variables can be considered to
be equivalent. Two random variables can be equal, equal almost surely, or equal in
distribution.
In increasing order of strength, the precise definition of these notions of
equivalence is given below.
a
If the sample space is a subset of the real line a possible definition is that random
variables Õ and V are
a if they have the same distribution
functions:
Two random variables having equal moment generating functions have the same
distribution. This provides, for example, a useful method of checking equality of
certain functions of i.i.d. random variables. However, the moment generating
function exists only for distributions that are good enough.
a

Two random variables Õ and V are

a if, and only if, the
probability that they are different is zero:
For all practical purposes in probability theory, this notion of equivalence is as

strong as actual equality. It is associated to the following distance:
where "ess sup" represents the essential supremum in the sense of measure theory.

Finally, the two random variables Õ and V are if they are equal as functions
on their measurable space:
o
a

Much of mathematical statistics consists in proving convergence results for certain
sequences of random variables; see for instance the law of large numbers and the
central limit theorem.
There are various senses in which a sequence (Õ) of random variables can
converge to a random variable Õ. These are explained in the article on convergence
of random variables.
c a

a a
3
It has been suggested that this article or section be merged into
a
a. (Discuss)
2.c
3.c
4.c The probability mass function of a discrete probability distribution. The
probabilities of the singletons {1}, {3}, and {7} are respectively 0.2, 0.5,
0.3. A set not containing any of these points has probability zero.
5.c
6.c
7.c The cdf of a discrete probability distribution,...
8.c
9.c
10.c... of a continuous probability distribution,...
11.c
12.c
13.c... of a distribution which has both a continuous part and a discrete part.
14.cIn probability theory and statistics, a a

a a is a
probability distribution characterized by a probability mass function. Thus,
the distribution of a random variable Õ is discrete, and Õ is then called a
a

a a
, if
15.c
16.cas runs through the set of all possible values of Õ. It follows that such a
random variable can assume only a finite or countably infinite number of
values. That is, the possible values might be listed, although the list might be
infinite. For example, count observations such as the numbers of birds in
flocks comprise only natural number values {0, 1, 2, ...}. By contrast,
continuous observations such as the weights of birds comprise real number
values and would typically be modeled by a continuous probability
distribution such as the normal.
17.cIn cases more frequently considered, this set of possible values is a
topologically discrete set in the sense that all its points are isolated points.
But there are discrete random variables for which this countable set is dense
on the real line (for example, a distribution over rational numbers).
18.cAmong the most well-known discrete probability distributions that are used
for statistical modeling are the Poisson distribution, the Bernoulli
distribution, the binomial distribution, the geometric distribution, and the
negative binomial distribution. In addition, the discrete uniform distribution
is commonly used in computer programs that make equal-probability
random selections between a number of choices.

a

a
19.cEquivalently to the above, a discrete random variable can be defined as a
random variable whose cumulative distribution function (cdf) increases only
by jump discontinuities²that is, its cdf increases only where it "jumps" to a
higher value, and is constant between those jumps. The points where jumps
occur are precisely the values which the random variable may take. The
number of such jumps may be finite or countably infinite. The set of
locations of such jumps need not be topologically discrete; for example, the
cdf might jump at each rational number.
20.cConsequently, a discrete probability distribution is often represented as a
generalized probability density function involving Dirac delta functions,
which substantially unifies the treatment of continuous and discrete
distributions. This is especially useful when dealing with probability
distributions involving both a continuous and a discrete part.

a

a a
21.cFor a discrete random variable Õ, let 0, 1, ... be the values it can take with
non-zero probability. Denote
22.c
23.cThese are disjoint sets, and by formula (1)
24.c
25.cIt follows that the probability that Õ takes any value except for 0, 1, ... is
zero, and thus one can write Õ as
26.c
27.cexcept on a set of probability zero, where 1 is the indicator function of .
This may serve as an alternative definition of discrete random variables

a a
In probability theory and statistics, a a a identifies either the
probability of each value of a random variable (when the variable is discrete), or
the probability of the value falling within a particular interval (when the variable is
continuous).[1] The probability distribution describes the range of possible values
that a random variable can attain and the probability that the value of the random
variable is within any (measurable) subset of that range.
c
c
cc

cc cc cc
When the random variable takes values in the set of real numbers, the probability
distribution is completely described by the cumulative distribution function, whose
value at each real is the probability that the random variable is smaller than or
equal to .
The concept of the probability distribution and the random variables which they
describe underlies the mathematical discipline of probability theory, and the
science of statistics. There is spread or variability in almost any value that can be
measured in a population (e.g. height of people, durability of a metal, sales growth,
traffic flow, etc.); almost all measurements are made with some intrinsic error; in
physics many processes are described probabilistically, from the kinetic properties
of gases to the quantum mechanical description of fundamental particles. For these
and many other reasons, simple numbers are often inadequate for describing a
quantity, while probability distributions are often more appropriate.
There are various probability distributions that show up in various different

applications. Two of the most important ones are the normal distribution and the
categorical distribution. The normal distribution, also known as the Gaussian
distribution, has a familiar "bell curve" shape and approximates many different
naturally occurring distributions over real numbers. The categorical distribution
describes the result of an experiment with a fixed, finite number of outcomes. For
example, the toss of a fair coin is a categorical distribution, where the possible
outcomes are
and
, each with probability 1/2.
Ña

In the measure-theoretic formalization of probability theory, a random variable is

defined as a measurable function Õ from a probability space to measurable
space . A a a is the pushforward measure Õ>P = PÕ í1
on .
a aa

a a

Because a probability distribution Pr on the real line is determined by the

probability of a real-valued random variable Õ being in a half-open interval (-, ],
the probability distribution is completely characterized by its cumulative
distribution function:
c
º

Y

A probability distribution is called

a if its cumulative distribution function
only increases in jumps. More precisely, a probability distribution is discrete if
there is a finite or countable set whose probability is 1.
For many familiar discrete distributions, the set of possible values is topologically
discrete in the sense that all its points are isolated points. But, there are discrete
distributions for which this countable set is dense on the real line.
Discrete distributions are characterized by a probability mass function, such that
c
o

Y
By one convention, a probability distribution is called

if its
cumulative distribution function is continuous and, therefore,
the probability measure of singletons for all .
Another convention reserves the term

a
a for
absolutely continuous distributions. These distributions can be characterized by a
probability density function: a non-negative mebesgue integrable function defined
on the real numbers such that
Discrete distributions and some continuous distributions (like the Cantor

distribution) do not admit such a density.
J
a
The a of a distribution is the smallest closed interval/set whose complement

has probability zero. It may be understood as the points or elements that are actual
members of the distribution.
A a

a a
is a random variable whose probability distribution is
discrete. Similarly, a a a
is a random variable whose
probability distribution is continuous.

a
a

`c c

c
c
cccccc
c c

c
ccÀ cccc
c
c
c
`c c

c
c
ccc
ccc
c
c
c
ccÀ À
cc
c
c
c
`c

c

cccccccccc c c

c

ccc ccc

ccc

ccc ccc c cc

cc
cc
c ccccc
ccc
oa a
Y
The following is a list of some of the most common probability distributions,

grouped by the type of process that they are related to. For a more complete list,
see list of probability distributions, which groups by the nature of the outcome
being considered (discrete, continuous, multivariate, etc.)
Note also that all of the univariate distributions below are singly-peaked; that is, it
is assumed that the values cluster around a single point. In practice, actually-
observed quantities may cluster around multiple values. Such quantities can be
modeled using a mixture distribution.

New Microsoft Office Word Document

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

New Microsoft Office Word Document

Uploaded by

Copyright:

Available Formats

c

An example of a continuous random variable would be one based on a spinner that

met (ȍ, , ) be a probability space, and ( , Î) a measurable space. Then an

This definition is a special case of the above because generates

 a   a  a 

Associating a cumulative distribution function (CDF) with a random variable is a

If a random variable defined on the probability space is

Recording all these probabilities of output ranges of a real-valued random variable

and sometimes also using a probability density function. In measure-theoretic

The probability distribution of a random variable is often characterised by a small

Ñ  a  a 

If we have a random variable on and a Borel measurable function

If function is invertible, i.e. 

and, again with the same hypotheses of invertibility of , assuming also

If there is no invertibility of but each admits at most a countable number of

where    . The formulas for densities do not demand to be increasing.

met Õ be a real-valued, continuous random variable and let V = Õ2.

Suppose is a random variable with a cumulative distribution

where is a fixed parameter. Consider the random variable Then,

The last expression can be calculated in terms of the cumulative distribution of Õ,

   a 

Two random variables Õ and V are   

For all practical purposes in probability theory, this notion of equivalence is as

There are various probability distributions that show up in various different

In the measure-theoretic formalization of probability theory, a random variable is

a    a  a

Because a probability distribution Pr on the real line is determined by the

A probability distribution is called

Discrete distributions are characterized by a probability mass function,  such that

By one convention, a probability distribution is called  

Another convention reserves the term  

Discrete distributions and some continuous distributions (like the Cantor

The a of a distribution is the smallest closed interval/set whose complement

oa    a  

Y       

The following is a list of some of the most common probability distributions,

You might also like

met (ȍ, , ) be a probability space, and ( , Î) a measurable space. Then an

a a a

Ñ a a

If function is invertible, i.e.

where . The formulas for densities do not demand to be increasing.

a

Two random variables Õ and V are

a aa

Discrete distributions are characterized by a probability mass function, such that

By one convention, a probability distribution is called

Another convention reserves the term

The a of a distribution is the smallest closed interval/set whose complement

oa a

Y