Professional Documents
Culture Documents
2 pareto-distribution.nb
Introduction
pareto-distribution.nb 3
Paretos Proposal
Pareto proposed that the number of people (N) with incomes higher than x can be modeled log-linearly:
Letting the total population be N0 and the minimum income be x0, so that log N0 = log A - a log x0, we can
write this in proportionate terms as
Note that for x1, x2 > x0 and associated N1, N2, this also implies
The Pareto Distribution is often presented in terms of its survival function (or reliability function, or tail
function), which gives the probability of seeing larger values than x. (I.e., it is 1-CDF; see below.) The
survival function is
Clear@x0, a, xD;
SurvivalFunction@ParetoDistribution@x0, aD, xD
I x0 M
x -a
x x0
1 True
Here x0 > 0 is the location parameter, and a > 0 is the shape parameter (or slope parameter, or Pareto
index). We are only interested in x > x0, and we are usually interested in a > 1 (which is required for finite
expected value).
1.0
0.8
0.6
0.4
0.2
Relation to Exponential
P[X>x]=P[x0 Y > xE = PAY > x x0E = P@Y > logHx x0LD = Hx x0L-a
Comparing to our survival function for the Pareto distribution, we see that X has a Pareto distribution.
pareto-distribution.nb 7
Log-Linear Survival
tailPareto = Simplify@
SurvivalFunction@ParetoDistribution@x0, aD, xD,
Assumptions x > x0 > 0D
x0 a
Note that the log of the survival probability is linear in logHx x0L. We can say that the size elasticity of the
survival rate is a. (We will return to this.)
Simplify@
Log@tailParetoD,
Assumptions x > x0 > 0 && a > 0D
F
x0
a LogB
x
pareto-distribution.nb 9
8PlotRange 880, 100 000<, 80, 1<<, AxesLabel 8Income, Survival Rate<,
Out[370]=
ImageSize 250, Ticks 8820 000, 40 000, 80 000<, 80, 0.25, 0.5, 1<<<
Pareto index
0.5
Out[371]=
0.25
0.5
0.25
0 Income Income
20 000 40 000 80 000 20 000 40 000 80 000
10 pareto-distribution.nb
Survival: Intuition
Consider a population of households and suppose sampling household incomes is like sampling from a
Pareto[10000,2].
What proportion of people earn more than $100000? From the form of the survival function, it should be
obvious that the answer is 1%: only 1 in 100 households earn more than $100000.
1
100
Note: given a = 2 and any x0, we find that 1% of the population has income greater than 10*x0. This is why
the Pareto distribution (along with other power law distributions) is called scale free.
Simplify@
SurvivalFunction@ParetoDistribution@x0, 2D, 10 * x0D,
Assumptions x0 > 0D
1
100
What is more, this relationship holds as well for subgroups: only 1% of the top 1% will have incomes again
ten times higher. This typifies a continuous power law distribution.
Simplify@
SurvivalFunction@ParetoDistribution@x0, 2D, 100 * x0D
SurvivalFunction@ParetoDistribution@x0, 2D, 10 * x0D,
Assumptions x0 > 0D
1
100
pareto-distribution.nb 11
The cumulative distribution function (CDF) gives probability of seeing a given size or lower. Note that x0 is a
minimum value, called the location parameter.
cdfPareto = Simplify@
CDF@ParetoDistribution@x0, aD, xD,
Assumptions x > x0 > 0D
x0 a
1-
x
pdfPareto = Simplify@
PDF@ParetoDistribution@x0, aD, xD,
Assumptions x > x0 > 0D
pdfPareto D@cdfPareto, xD PowerExpand
x-1-a x0a a
True
12 pareto-distribution.nb
Pareto index
2.04
CDF PDF
1.0 5
Out[1048]=
0.8 4
0.6 3
0.4 2
0.2 1
Size 0 Size
0 1 2 3 4 5 0 1 2 3 4 5
pareto-distribution.nb 13
options02 =
8PlotRange 880, 5<, 80, 5<<, AxesLabel 8"Size", "PDF"<, ImageSize 250<;
Manipulate@
LogLogPlot@PDF@ParetoDistribution@1, $$aD, $xD,
8$x, 1, 5<, Evaluate@options02DD,
88$$a, 1, "Pareto index"<, 0.1, 5<D
Pareto index
0.1
0.01
0.001
Size
1.0 1.5 2.0 3.0
14 pareto-distribution.nb
A power law distribution with shape parameter a has probability distribution function
pHxL = c x-H1+aL for x > x0
Clear@c, x, aD
pdfPower = c x-H1+aL ;
The constant c must be chosen to satisfy unitarity. In the continuous case, we compute the integral
c x0-a
a
8c x0a a<
x-1-a x0a a
pdf . soln
x-1-a x0a a
pareto-distribution.nb 15
Average Size
pdfPareto
x-1-a x0a a
Use this PDF to compute the average size of a draw from the Pareto distribution as x0 x pHxL dx.
Clear@x0, x, aD
meanPareto = Assuming@x0 > 0 && a > 1,
Integrate@x * pdfPareto, 8x, x0, + Infinity<D
D
x0 a
-1 + a
pareto-distribution.nb 17
x0 x pHxL x
t
Clear@x0, x, a, tD
cumSize = Assuming@t > x0 > 0 && a > 1,
Integrate@x * pdfPareto, 8x, x0, t<D
D
Dividing by the mean (i.e., the probability weighted sum of all incomes) produces an expression for the
proportion of total income constituted by incomes of t or less: 1 - Ht x0L1-a .
Note that this expression only makes sense for a > 1, the case in which the mean exists. Also note that it
only depends on the ratio t x0.
1 - t1-a x0-1+a
True
18 pareto-distribution.nb
Lorenz Curve
A Lorenz curve for income plots this cumulative share of income vs the cumulative share of the population
earning it.
We have just found that the cumulative share of income for incomes less than t can be written as
1 - Ht x0L1-a . Recall that the CDF at income t, which gives the proportion of the population earning less than
t, is 1 - Ht x0L-a . So for any alpha, we can make a parametric plot of the Lorenz curve. Defining m = x0 t we
can write:
ManipulateA
ParametricPlotA91 - m$a , 1 - m$a-1 =, 8m, 0, 1<,
PlotRange 880, 1<, 80, 1<<, AspectRatio 1, ImageSize SmallE,
88$a, 2<, 1.01, 5<E
$a
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
pareto-distribution.nb 19
Lorenz Curve
Our parametric representation of the Lorenz curve can be used to derive a function, which expresses the
cumulative share of income as a function of the cumulative share of the population earning it. Recall that
the CDF at income t gives the proportion of the population (say, sHtL) earning less than t. Since the CDF is
strictly increasing, we can produce the inverse function tHsL, which we can then substitute into cumshare.
20 pareto-distribution.nb
Solve::ifun :
Inverse functions are being used by Solve, so some solutions may not be found; use Reduce for complete solution information.
9t H1 - sL-1a x0=
1 - H1 - sL1- a
1
$a
1.0
0.8
0.6
0.4
0.2
Gini Coefficient
The Gini Coefficient is twice the area between the 45 degree line and the Lorenz curve. We can caculate
that area as
AssumingBa > 1,
F
gini = 2 * % Simplify
1
-2 + 4 a
1
-1 + 2 a
Solve@g gini, aD
::a >>
1+g
2g
Solve@d 1 - 1 a, aD Flatten
gini . % Simplify
Solve@g %, dD
:a >
1
1-d
1-d
1+d
::d >>
1-g
1+g
22 pareto-distribution.nb
Pareto index
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
80-20 Rule: Pareto (1906) noticed that about 80% of the land in Italy was owned by about 20% of the
population.
However his British tax return data showed something closer to 70-30.
There will always be some such proprotion: look for where the Lorenz curve crosses the unit simplex.
pareto-distribution.nb 23
80-20 Rule
We have seen with a = 2 that 1% of the population has a size at least 10 times the minimum, and 1% of that
1% has a size 10 times that.
More generally, if a > 1 (so that the expected value is finite), is some fraction 0 f 1 2 such that f of
those sampled receive H1 - f L of all income, and similarly for every real (not necessarily integer) n > 0,
100pn % of all people receive 100(1 - p)n % of all income.
AssumingAa > 1,
SolveA1 - s 1 - H1 - sL1-1a , sE
E
AssumingA1 > d > 0,
SolveAs H1 - sLd , sE
E
AssumingAd > 0,
SolveAs + s1d 1, sE
E
Solve::nsmet : This system cannot be solved with the methods available to Solve.
SolveB1 - s 1 - H1 - sL1- a , sF
1
Solve::nsmet : This system cannot be solved with the methods available to Solve.
SolveAs H1 - sLd , sE
Solve::nsmet : This system cannot be solved with the methods available to Solve.
1
SolveBs + s d 1, sF
SolveB1 - s 1 - H1 - sL1- a , aF
1
Solve::ifun :
Inverse functions are being used by Solve, so some solutions may not be found; use Reduce for complete solution information.
::a >>
Log@1 - sD
Log@1 - sD - Log@sD
24 pareto-distribution.nb
1
SolveBs + s d 1, dF
Solve::ifun :
Inverse functions are being used by Solve, so some solutions may not be found; use Reduce for complete solution information.
::d >>
Log@sD
Log@1 - sD
1.0
0.8
0.6
0.4
0.2
We can generate a sample from a Pareto distribution by sampling from a uniform distribution on (0,1].
We transform each point U from the uniform according to X = x0 U 1a .
Then looking at the survival function we have
P@X > xD = PAx0 U 1a > xE = PA U 1a < x0 xE = P@U < Hx0 xLa D = Hx0 xLa
Technical note: note that we must rule out drawing a 0 from our uniform distribution. Most software draws
from the interval @0, 1L . In this case, just use 1 - U for your sample.
pareto-distribution.nb 27
Clear@sizedata, incomesD
alpha = 2; xmin = 10 000; npts = 2000;
sizedata = xmin H1 - RandomReal@1, nptsDL ^ H1 alphaL;
noise = RandomVariate@NormalDistribution@0, 100D, nptsD;
sizedata = Sort@sizedata + noiseD;
Clear@noiseD
proportionlarger = Reverse@Range@nptsDD npts N;
survivaldata = Transpose@8sizedata, proportionlarger<D N;
ListPlot@survivaldataD
llplot = ListPlot@Log@survivaldataDD
1.0
0.8
0.6
0.4
0.2
-1
-2
-3
An obvious estimator for x0 is the minimum observation (which is also the maximum likelihood estimator).
Recall that the mean of the distribution is x0 a Ha - 1L, we can then estimate a using
Clear@mean, x0, aD
minsize = Min@sizedataD
8meansize = Mean@sizedataD, theoreticalmean = xmin * alpha Halpha - 1L<
Solve@mean x0 * a Ha - 1L, aD . 8mean meansize, x0 minsize<
9792.57
88a 2.04883<<
Not bad.
We might improve a little on this by estimating x0 with using the expected value for the minimum observa-
tion given the sample size.
(See http://www.math.umt.edu/gideon/pareto.pdf)
pareto-distribution.nb 29
Recall that the survivial function told us that the proportion surviving is linear in logHx0 xL. So we can look
for a simple linear fit. The coefficient on x is our estimate of the Pareto index.
alpha
fit = Fit@Log@survivaldataD, 81, x<, xD H* linear fit to logged data *L
coefs = CoefficientList@fit, xD
Show@8llplot,
Plot@fit, 8x, Log@First@sizedataDD, Log@Last@sizedataDD<, PlotStyle 8Red<D<D
Exp@- coefsP1T coefsP2TD H* implied value of x0 *L
18.7679 - 2.03951 x
818.7679, - 2.03951<
-1
-2
-3
9918.88
9.66847107
F
$x 9949.45
FittedModelB $x19
1 True
30 pareto-distribution.nb
We are often forced to work with binned data. Lets create some.
0.030
0.100
0.025 0.050
:0.015 >
0.020
0.010
, 0.005
0.010
0.001
0.005 5 10-4
In our last slide, we cheated a bit by showing only the bins for relatively small sizes, which occur with the
greatest frequency. As size increases, relative frequency falls, and statisitcal noise becomes more promi-
nent, even if we substantially increase bin size.
xmax = 200 * x0; bins = Table@i, 8i, x0, xmax, xmax 100<D;
relfreq = BinCounts@x, 8bins<D npts;
mypoints = Transpose@8Rest@binsD, relfreq<D;
ListLinePlot@mypoints, PlotRange 88100 * x0, Automatic<, 80, 10 ^ - 5<<D
0.00001
8. 10-6
6. 10-6
4. 10-6
2. 10-6
0
100 120 140 160 180 200
32 pareto-distribution.nb
We might hope to address this by moving to a log scale. This proves informative but is only partially success-
ful. Why? Our bins are still linear.
Notice the empty bins for large sizes.
ListLogLogPlot@mypointsD
0.1
0.01
0.001
10-4
10-5
5 10 20 50 100 200
pareto-distribution.nb 33
Logarithmic Binning
It works better to let our bin size grow as we consider larger size realizations: we can use logarithmic
binning.
-2
-4
-6
-8
-10
-12
1 2 3 4 5
Clear@xD
34 pareto-distribution.nb
Census Data
incomes2010 = 811 904, 20 000, 49 445, 100 065, 138 923, 180 810<;
cdf2010 = 810, 20, 50, 80, 90, 95< 100.;
tail2010 = 1 - cdf2010;
incomecdf2010 = Transpose@8incomes2010, cdf2010<D;
incometail2010 = Transpose@8incomes2010, tail2010<D;
Labeled@GraphicsRow@8
g`cdf2010 =
ListPlot@incomecdf2010, AxesLabel 8"income", "cdf"<, AxesOrigin 80, 0<,
PlotStyle PointSize@0.02D, Ticks 88815 000, "$15k"<, 850 000, "$50k"<,
8100 000, "$100k"<, 8150 000, "$150k"<<, Automatic<, ImageSize 400D,
g`tail2010 = ListPlot@incometail2010, AxesLabel 8"income", "tail"<,
AxesOrigin 80, 0<,
PlotStyle PointSize@0.02D, Ticks 88815 000, "$15k"<, 850 000, "$50k"<,
8100 000, "$100k"<, 8150 000, "$150k"<<, Automatic<, ImageSize 400D
<D, "2010 Census Data"D
cdf tail
0.8
0.8
0.6
0.6
Out[722]=
0.4
0.4
0.2 0.2
income
$15k $50k $100k $150k $15k $50k
17 914.6 J x N
1 1.01727
Out[810]= x > 15 169.9
1 True
tail cdf
0.8
0.8
0.6
0.6
Out[812]=
0.4
0.4
0.2 0.2
income
$15k $50k $100k $150k $15k $50k
A problem with this log-linear survival fit it that it estimates minimum income at a value above the minimum
observed value. But the same thing happens with a nonlinear estimation.
38 pareto-distribution.nb
849 445, 0.5<, 8100 065, 0.8<, 8138 923, 0.9<, 8180 810, 0.95<<
nlm01 = NonlinearModelFit@incomecdf2010,
CDF@ParetoDistribution@khat, ahatD, $xD, 8khat, ahat<, $xD;
nlm01@"BestFitParameters"D
gpareto2010 = Show@ListPlot@incomecdf2010, AxesOrigin 80, 0<D,
Plot@nlm01@$xD, 8$x, 0, 200 000<, PlotStyle 8Red<DD
0.8
0.6
Out[698]=
0.4
0.2
cdf
0.8
0.6
Out[1026]=
0.4
0.2
income
$15k $50k $100k
Out[1028]=
8.43993 - 0.872352 x
88.43993, - 0.872352<
Out[1029]=
Out[1030]=
0.872352
40 pareto-distribution.nb
Out[1031]=
15 913.4
1 0.872352
Out[1032]=
1 - 4628.25
x
cdf
0.8
0.6
Out[1034]=
0.4
0.2
income
$15k $50k $100k
pareto-distribution.nb 41
Lets fit these data points to a Pareto distribution, using NonlinearModelFit. (Mathematica 9 gives a perfect
match to the same estimation on Maclachlan (2006), who used Mathematica 5.)
835 000, 0.402<, 850 000, 0.55<, 875 000, 0.733<, 8100 000, 0.843<<
nlm01 = NonlinearModelFit@data2004fm,
CDF@ParetoDistribution@khat, ahatD, $xD, 8khat, ahat<, $xD;
nlm01@"BestFitParameters"D
g`pareto =
Show@8g`data2004fm, Plot@nlm01@$xD, 8$x, 0, 100 000<, PlotStyle 8Red<D<D
Out[1040]=
True
cdf
0.8
0.6
Out[1043]=
0.4
0.2
income
$15k $50k $100k
42 pareto-distribution.nb
Puzzle
nlm01["BestFitParameters"]
nlm01["ParameterConfidenceIntervals"]
nlm01["ParameterErrors"]
{3192.72, 0.212397}
0.8
0.6
0.4
0.2
Lognormal
In[944]:=
GraphicsRow@8
Plot@SurvivalFunction@LogNormalDistribution@0, .2D, xD,
8x, 0, 2<, AxesOrigin 80, 0<D,
LogLogPlot@SurvivalFunction@LogNormalDistribution@0, .2D, xD,
8x, 0.01, 2<, PlotRange AllD
<, ImageSize LargeD
GraphicsRow@8
Plot@CDF@LogNormalDistribution@0, .2DD@xD, 8x, 0, 2<, AxesOrigin 80, 0<D,
Plot@PDF@LogNormalDistribution@0, 0.2DD@xD, 8x, 0, 2<, AxesOrigin 80, 0<D
<, ImageSize LargeD
1.0 1
0.8
0.1
0.6
Out[944]=
0.01
0.4
0.2
0.001
0.5 1.0 1.5 2.0 0.02 0.05 0.10 0.20 0.50 1.00 2.00
In[949]:=
Simplify@
8CDF@LogNormalDistribution@m, sD, xD, PDF@LogNormalDistribution@m, sD, xD<,
Assumptions x > 0D
Hm-Log@xDL2
-
: F, >
Out[949]= 1 m - Log@xD 2 s2
ErfcB
2 2 s 2p xs
1.0 2.0
0.8
1.5
0.6
Out[945]=
1.0
0.4
0.5
0.2
Fit Lognormal
In[983]:=
Clear@xD
model = CDF@LogNormalDistribution@m, sD, xD
nlm02 = NonlinearModelFit@data2004fm, model, 88m, 10<, 8s, 1<<, xD
nlm02@"BestFitParameters"D
fittedmodel2004fm = model . %
F x>0
1 m-Log@xD
Out[984]= 2
ErfcB
2 s
0 True
8m 10.6606, s 0.92634<
Out[986]=
cdf
0.8
0.6
Out[1039]=
0.4
0.2
income
$15k $50k $100k
In[1044]:=
g2 = Show@8g`lognormal, g`pareto<,
PlotLabel "Compare Lognormal HBlueL and Pareto HRedL"D
0.8
0.6
Out[1044]=
0.4
0.2
income
$15k $50k $100k
pareto-distribution.nb 47
0.8
0.6
Out[1046]=
0.4
0.2
Beaman CDF
Limit@beamanCDF, x D
Limit@beamanCDF, x aD
Limit@beamanCDF, x bD
D@beamanCDF, xD Simplify
Out[365]= 1
H-a+bL3
H-a+xL3
1+
Out[366]=
1
Out[367]=
0
Out[368]= 1
2
Out[369]= 3 H- a + bL3
J1 +
Ha-bL3
N Ha - xL4
2
Ha-xL3
Beaman
1
Hb-aL3
Hx-aL3
1+
Beaman
Maclachlan (2006) reports that the Beaman distribution was used at duPont in the 1970's to model sales
volume of products at various price points. because it gave a better fit than the lognormal. There are two
parameters: a represents the minimum price point and b represents the median price point. The distribution
allows negative values. See http://library.wolfram.com/infocenter/MathSource/6021/ for details. For elabora-
tion, see http://home.manhattan.edu/~fiona.maclachlan//beaman/beaman_notebook/.
1
beamanDistribution@$x_, a_, b_D := ;
Hb-aL3
H$x-aL3
1+
incomepoints = 8815 000, 0.154<, 825 000, 0.28300000000000003<, 835 000, 0.402<,
850 000, 0.55<, 875 000, 0.7330000000000001<, 8100 000, 0.8430000000000001<<;
nlm03 = NonlinearModelFitAincomepoints,
91 I1 + Hb - aL3 H$x - aL3 M, a < 0, b > 25 000=, 8b, a<, $xE;
Normal@nlm03D
bfp = nlm03@"BestFitParameters"D
1
3.484751014
H25 583.6+$xL3
1+
8b 44 786.9, a - 25 583.6<
These results are close to those of Machlachlan (who got values: {b44884.7,a-27515.7}). Unfortunately,
the estimates are *highly* sensitive to the constraint values. In any case, we use constraints based on
Machlachlan (2006) and get a pretty good fit, as illustrated here.
pareto-distribution.nb 51
cdf
0.8
0.6
0.4
0.2
income
$15,000 $50,000 $100,000
0.8
0.6
0.4
0.2
income
$15,000 $50,000 $100,000
52 pareto-distribution.nb
References
Maclachlan, Fiona (2006). Investigating Power Laws with Mathematica. http://library.wolfram.com/infocen-
ter/Conferences/6461/