You are on page 1of 14

Computing Probability Density Function

Rajesh Karan
Riskraft Counsulting Ltd
Chakala, Andheri(E),Mumbai-93
October 30, 2007

In this note I wish to introduce various numerical methods to compute density


function. Although these standard methods are discussed in various text books,
purpose of this note is to collect them at one place and most importantly keeping the
discussion as easy as possible so that it can be accessible to the beginers. We will see
some not so conventional methods and will be delight for the experts.
Before we start let us create a common ground from where we will take off to the
specific topics. Below, we will review some of the terminologies we allready know.
The concepts will be presented as random as possible. Then we will try to be coherent
and summerize what we have said. We will also discuss the direction one can move
and then we will return to the main point.

1 Randomness
Have you came across the situation when we say random and somebody asks us to
explain what do we actualy mean by it? All of us have certain kind of intution about
this word but we all know when it comes to define it mathematicaly we see ourselves in
difficulty. The most meaningful definition of randomness which comes into my mind
is associated with the observation (outcomes) which cannot be predicted by humans.
The definition of being random depends on the perception and the horizon upto which
one can think and explore. In this article, we will not try to explore various other
interpretation of randomness. We will just assume that whatever observation we
are going to make next time is unpredictable for sure. These observations are truly
random in nature. Randomness is the measurement criteria of how sure we are about
the unpredictabiliy. Thus if we are 0% sure that observation is unpredictable then
this is equivalent of saying that we are 100% sure that observation is predictable, i.e.
observation is not random. However we have problem categorising the surity itself.
We will see later on how to classify these things.

1
1.1 let us think..
Let us divert from statistics and also from the main topic of this note. Let us talk
about digital camera available in market these days. We have in-built digital camera
in mobile phones, in webcams, consumer electronics etc. We might have seen various
advertisements like 6.1 megapixel, 12.5megapixel etc. Have you ever thought about
this megapixel war!! Branded companies ask for huge prices than the similar offering
from smaller, maybe unbranded companies. If you have ever bought some of the
unbranded digital camera and got a cance to compare it with the result of branded
one, you might have seen that these unbranded or low budget camera with same
megapixel count has infact different results. If you have not seen such results, then
let me motivate you to some other directions. Have you ever seen advertisement
billboards? You might have observed them changing over the years. Earlier, larger
advertisemt boards were painted by the artists, but now you see actual photographs
of the models. Ever you wondered how it is possible to print such a large picture,
when camera is just a small device!!
Well, let us bother about these observation and see what happens when we take
a photograph with a digital camera and how it is being printed upto larger than life
sizes. I was talking about the picture quality of branded and non-branded digital
cameras. Upfront let me tell you that these differences are due to the lenses used in
the camera which you feel in terms of price. For example, prices differs by a huge
margin if a camera from Olympus is using lenses made by Olympus or the Lenses used
by Leica. Same camera will produce different result due to lense’s design. However, if
you use same lense in different cameras, says Sony and Olympus is using Lieca’s lense,
you will see there is difference in the impages produced. By changing cameras and
using particular lens you can now find the difference between cameras. The reason
why these camera differs is now related to internal arrangement of the CCD sensor
and the signal processing algorithm embeded in the camera chip. The CCD, which
is acronym for Charged Coupled Device has various other replacements, like CMOS
sensors. These sensors have grid-like arrangement of light receptive centers. These
individual sensors are called pixel. Total number of pixels embeded in the sensor is
the count of so called mega-pixel. But the mega-pixel available to the consumer is
different than these actual counts. The unbranded or low budget camera have smaller
sensors, and therefore low count of actual pixel, which is then analysed by the signal
processing chips and converted to bigger count. These are called effective mega pixel.
Thus from next time, if you planning to buy a good camera, just look out for sensor
size, actual pixel counts and effective pixel count. Shortly we will see what will the
good buy, and we will appreciate, why in this competitive market of digital cameras
offering big counts, a good quality cameras still demands good budget. That means
you need to pay for the quality.
Let us just think about a sqare sensor with one pixel. There are two states of the
pixel, either it has received the light and is sending an electric signal, or it has not
received any light. That is, pixel is either on or off. With this one pixel camera, we
have only two kind of photographs: one is totaly black (off state) and other is totaly
white (on state). It is impossible to differentiate whether we have taken picture of a

2
human or an animal. All of them will look same on the photograph. Now let us divide
same sensors into four pixels, i.e. four pixels whose area is one fourth of the original
one. With these four pixels, we cannot differentiate between a human being and an
animal, but we can differentiate between the letters like L where three pixels are on
and one is off, and I where two pixels are on and two are off. Thus increasing pixels
will increase the amount of information conetent in the image. Further, if we replace
each of these 1/4th size pixel by 1/4th of it’s size, i.e. entire sensor has 16 pixels
instead of 4, we see the amount of information is different in different arrangements.
Now let us wait and think, what if I have only 4-pixel camera and I wish to find
out the effect similar to 16-pixel camera!! Is it possible? And the answer is yes.
We can create the effect, but cannot come closer to 16-pixel camera. This is exaclty
why low budget cameras does not produce results as good as branded, higer budget
cameras. We will see how to do it. Our problem is shown below

+-------+-------+ +---+---+---+---+
| | | | a | b | c | d |
+ A + B + +---+---+---+---+
| | | | e | f | g | h |
+-------+-------+ => +---+---+---+---+
| | | | i | j | k | l |
+ C + D + +---+---+---+---+
| | | | m | n | o | p |
+-------+-------+ +---+---+---+---+

4 pixel 16 pixel

The letters written in the blocks are the state of the pixel. Thus in 4-pixel sensor,
if A denotes ON state, then an equivalent representation in 16-pixel will also define ON
state for a given image. But here is a difference: let us look at following logic:

if any one of (a,b,e,f) is ON then A is ON : we have 4 possibilities


if any two of (a,b,e,f) is ON then A is ON : we have 6 possibilities
if any three of (a,b,e,f) is ON then A is ON : we have 4 possibilities
if any four of (a,b,e,f) is ON then A is ON : we have 1 possibility

A similar logic is applied to other pixel’s state. For a given image, if photographed
by 16-pixel, 15-different (=4+6+4+1) possibilities are represented by 1 possibility of
the 4-pixel sensor. If our intention is to create true image (i.e. 15 states) from one
information, it is difficult. Here comes the signal processing algorithm to help. Other
ideas and knowledge is now used to infer what could be the best guess. The algorithm
we are going to discuss below may not give you evident results, but for slightly larger
count you can see that prediction becomes easier, though not accurate. We start with
looking at the amount of current being produced at pixel-A and can be categorize

3
into five grades. Assume, maximum current produced in one pixel is 1µ-Ampeare.
Then we can make slots as follows: (0, 0.25], (0.25, 0.5], (0.5, 0.75], (0.75, 1.0]µA. The
measured current will fall in any of these categories. This will resolve the idea of how
many pixels are in ON state, thereby reducing the number of possibilities. Now for
example, measured current by pixel-A is in the range (0.75, 1.0]µA, we can immidi-
ately set all four equivalent pixels (a, b, e, f ) to ON state. When we slot the current
in (0.5, 0.75]µA range, we have 4 choices out of 15. We use another information,
assuming contrast of the image does not drop at the boundary of the pixels, if pixels
B C,D are in ON state then it is naive to think that pixels (b, e, f ) will be in the ON
state. If pixels (C,D) are ON and we have slotted pixel-A into (0.5, 0.75]µA then there
is possibility that pixel-f is in the OFF state and pixels (a,b,e) are in ON state. Thus
if we have some information about the image then, with the help of smaller set of
information, we can try to recreate the actual information. Readers, at this point
will be confused on the rules I have discussed just now. I know they will be utterly
confused with the relation of the main topic of this note and digital camera sensor.
Therfore, let me come back and identify some kind of resemblance between these two
concepts.
We were talking about lack of knowledge, which holds the key of randomness in
the observation. We were also talking about finding the state of the pixels in camera
sensor. We saw, there were more than one possibilities, if we find that one the pixel
is in ON state. However we are sure, does not matter what, there is one and only
one image and therefore there is only one set of information. Suppose we increase
the number of pixels to 64, we will get more information than the 16-pixel sensor.
And the amount of information it holds is 264 . To consruct this information from 216
sets, we need some kind of guess about the state of the effective pixels. We may cross
check it’s state by various other relatively correlated informations. But the truth is, it
will be purely luck, if we could construct the actual set of information, starting from
smaller set of informations. In practice, we cannot reproduce actual information,
because we lack the complete knowledge. In some sense, the information hidden in a
pixel is a superposition of the states of effective pixel it can hold. That is, for 4 and
16 pixel example we can write

state( A ) = some function of (


state (a), state (b), state (e), state (f)
)
= F(a,b,e,g) mathematicaly

If it is possible to invert function F () defined above, then we know the state of


invidual smaller pixels. If you are still wondering about the motivation, then assume,
A as the average of four numbers (a, b, e, f ). All of which are real numbers. A real
number A can be written as a sum of four real numbers, or the average of four real
numbers. The magnitude of pixel’s current is represented by the real numbers. Let
us do some mathematics, and see what I have been talking about. Assume state of
(a,b,e,f) of 16-pixel sensor. Denote ON state by one and OFF by zero. Then following
is true

4
a b e f A’ = (a+ F2(a,b,c,d) = int ( w1*a +w2*b
b+e+f) A F1(A’) +w3*c +w4*d )
0 0 0 0 0 0 A’*0.00 0.0*0 + 0.0*0 + 0.0*0 + 0.0*0 = 0
1 0 0 0 1 1 A’*1.00 1.0*1 + 1.0*0 + 1.0*0 + 1.0*0 = 1
0 1 0 0 1 1 A’*1.00 1.0*0 + 1.0*1 + 1.0*0 + 1.0*0 = 1
0 0 1 0 1 1 A’*1.00 1.0*0 + 1.0*0 + 1.0*1 + 1.0*0 = 1
0 0 0 1 1 1 A’*1.00 1.0*0 + 1.0*0 + 1.0*0 + 1.0*1 = 1
1 1 0 0 2 1 A’*0.50 0.6*1 + 0.6*1 + 0.6*0 + 0.6*0 = 1
1 0 1 0 2 1 A’*0.50 0.6*1 + 0.6*0 + 0.6*1 + 0.6*0 = 1
1 0 0 1 2 1 A’*0.50 0.6*1 + 0.6*0 + 0.6*0 + 0.6*1 = 1
0 1 1 0 2 1 A’*0.50 0.6*0 + 0.6*1 + 0.6*1 + 0.6*0 = 1
0 1 0 1 2 1 A’*0.50 0.6*1 + 0.6*1 + 0.6*0 + 0.6*1 = 1
0 0 1 1 2 1 A’*0.50 0.6*0 + 0.6*0 + 0.6*1 + 0.6*1 = 1
1 1 1 0 3 1 A’*0.33 ....
1 1 0 1 3 1 A’*0.33 ....
0 1 1 1 3 1 A’*0.33 ....
1 0 1 1 3 1 A’*0.33 ....
1 1 1 1 4 1 A’*0.25 ....

Our observation can be of the following kind:

• we can choose any mapping of (a,b,e,f) which gives us value of A properly.

• there can be infinite number of such possibilities, two of these are shown above.

• depending on what is the function which maps (a,b,e,f) to A, we get different


sets for (a,b,e,f). For example, if set of coefficients (0.6,0.6,0.6,0.6) is replaced
by (0.8,0.2,0.5.0.9) then it is possible to pick the preferential pixel.

• the coefficients wi s introduced in F 2 are some kind of weights multiplied with


the state as observed for (a,b,e,f).

• continuing this way and asuming there are large number of such states denoted
by x, and writing w(x) as the weight of state x we wish to write the pixel-A’s
state as
N
X
A= w(xi ) × xi
i=1

• equation given above resembles the formulla for mean of a set of numbers xi
whose probability distribution function is given by w(xi ).

• since the state of xi is not known a priori, neither it is possible to infer from
about equation, if only the left side the expression is known and/or a few of
right hand side are known, then we decide to call the states x1 = a, x2 = b,
x3 = e, or x4 = f as a random variables and the set {xi ; i = 1, 2, · · · , n} as a set
of random numbers. The function w(xi ) represents the probability distribution
of the random variable xi .

5
1.2 we already know something..
To make further connection with the main topic, let us become slightly incoherent and
discuss what we already know about density function. The probability distribution
function which is close to the heart of many people working in this field is log-normal,
which is special case of Gaussian function. Therefore, I will choose the Gaussian
function as a starting point. If x is continous random variable for which we define
the distribution function as Gaussian, given as
 !2 
1 x−µ
G(x; µ, σ) = √ exp − √  (1)
σ 2π 2σ

where µ is the mean and σ is the variance of the variable x. The constant multiplier
√1 is needed to make sure that G(x; µ, σ) when intergated over x in the range
σ 2π
(−∞, ∞) is normalized to one. However we redefine the variable x by the relation

x = 2σ ∗ y + µ.

the equation 1 is now written as


1  
G(y) = √ exp −y 2 . (2)

These transformation is only done to make discussion simpler, while keeping the
requirements intact. Recall, if we transform y to the logarithm of another variable, say
z, then G(z) will be known as log-normal pdf in terms of y. Apart from normalization
coefficient, the gaussian pdf with respect to it’s argument looks like the figure given
in 1(a) and the line chart in 1(b).
It will be appropriate to mention that we are interested in finding probability
distribution because we wish to find out on the average outcomes. For example, we
wish to know what will be on the average stock price, or on the average interest rate.
Because we cannot predict such things in advance, we wish to remain as close as
possible to the actual outcomes. Statistics, under certain assumptions, shows one
way of achieving such targets. The gateway of such analysis is to find the probability
distribution of the outcomes of underlying random processes. Therefore our main aim
is to study the features of probability density functions. In this article I will not go
into details at-most-sure definitions of the concepts and keywords. Rather, I will try
to motivate and combine the facts we already know.
At this point where we wish to take the turn. In actual practice, given a set of
random number, supposedly drawn from gaussian set, the pdf will be smooth and have
proportionate line chart, only if the numbers we draw is infinity. With a less amount
of points, a sutaible pdf will not be as smooth as we see here, rather it will closely
resemble to what is seen in figure 1. Below we show charts with increasing number of
points in the figure 4. We will also draw the histogram, side by side to make ourself
more comfortable. The histogram, similar to the line chart, provides a graphical
summary of the shape of the data’s distribution. In fact for a continuos random
number, which is normaly distributed, the envelope of line chart should coincide

6
(a) (b)

4
exp(-x*x/2)/sqrt(2*pi)

exp(-x*x/2)/sqrt(2*pi)
2

2
Gaussian PDF

Gaussian PDF
x ->

x ->
0

0
-2

-2
-4

-4
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
p(x) -> p(x) ->

(continuous) (line chart)

Figure 1: Gaussian PDF 1

with the aparent shape of the histogram if and only if we generate infinite number of
random numbers. And even for moderate number of points, the envelope of histogram
will start resembling the actual pdf. If we scale the histogram by total number of
counts, then it will be known as probability distribution function. The differene in
line chart and histogram will become apparent now. For our set the line chart start
getting dimnished as we increase the number of points in figure 4. The green lines
drawn at the base of figures 4(a,b,c,d) are line chart, and starts dissapearing. A line
chart at the point x0 , counts how many times given number x0 comes in the collection
X. Of course, if we are talking about a set X of discrete random numbers then, the
count for any given number drawn from the set of real numbers R may be zero, and
line chart is the true presentation of the pdf associated with X. Example of set X
can be thought as an outcome of dice with say 6-faces. If faces are numbered as
1, 2, 3, 4, 5 and 6 then probability of any number which belongs to [−∞, 1) and (6, ∞]
will be zero. Probability of number 1 is non-zero, but any number between 1 and 2
is zero. Thus what we find that the pdf is actualy a line chart.
Histogram, on the other hand deals with seemingly continuos set, let us denote it
by Y . The set Y is not countable, and contains infinite number of points, while set
X contains countable but infinite number of points (consider number of faces of the
dice → ∞). The set Y closely matches with the set R of real number.
While interpreting charts we see some kind of problem arising while defining pdf,
because our intution says that the sum of the probabiliy of getting any real number
from a set of real number should not be zero. The problem comes because we are try-
ing to use two different definitions in one. We are using countable infinities (discrete

7
space) and using Remannian probability measure (continous space). For defining pdf
for a set having countable infinite number of points, we must use Lebesgue probability
measure and Borel sets. In this note we will not discuss those things and we stop
talking about line chart from now onwards. For the same reason we will not compare
line chart and histogram any more.
To visualize sets X and Y , assume looking at the pixel as a square. If one wishes
to paint this square, there are number of ways. Closer to our requirements are two
ways: (1) take the paint brush and paint it or (2) take a pen with fine tip and start
filling it with dots. Both ways we can paint the square. However we can immidiately
recognize that, we can keep tracks of number of dots we put, but have no idea how
many dots are created by the paint brush. We also realize that to represent a set
of uncountable infinite number of points (paint brush) with a set of countable finite
numbers (fine dots) is an approximation.
Borrowing the idea of digital camera sensor, it’s pixel and the current flowing
through these pixels, let us try to visualize it on the paper. If we choose to represent
the pixel by the square then the current (µA) will be represented by a color of the
sqaure. Without stressing too much in abstract let us have a look at figure 2. We
are now looking at a bright circular spot as seen by the sensor. We show the current
measured at pixel level on sensor with 4 pixels, 16, 64 and 256 pixels. We can notice
that as we increase the number of pixels, the lines drawn on the base of each figures
starts resembling like a circle. Which confirms our assertion that lower number of
pixels does not contain information as close as the actual, see figure 3 where if we
look at the contours, we still make out square instead of circle as inended. Though
the number of points (pixels) are countable in this case, we have shown surfaces
connecting the points to mimic the seemingly continuous set.
In real life, prices in the stock exchange can be anything, and to collect all the
prices we wait for infinte amount of time. This way we collect all possible prices,
which we can count. Then we can find the probability distribution function. However
waiting for infinite times to do the trade or suggesting to wait for infinite time is a
good sign of intellectual bankrupcy.

1.3 our aim...


Our main aim is now to find a suitable approximation of the actual pdf. And to deter-
mine the suitability we decide to look at following properties, assume our observables
Ô depends on variable x, where true pdf for x is given by Ptrue (x) and approximated
by Papp (x), then we want
h i
• expectated value of Ô defined as E Ô, Ptrue (x) =
R
x ÔPtrue (x) should be as
h i R
close as E Ô, Papp (x) = x ÔPapp (x) as required.
h i
• variance for Ô defined as E Ô2 , Ptrue (x) = Ô2 Ptrue (x) should be as close as
R
x
h i
E Ô2 , Papp (x) = Ô2 Papp (x)
R
x

8
(a) (b)
4-pixel sensor 16-pixel sensor
0.95
0.9
0.85
0.8
0 0.892 0 1
0.89
0.888 0.95
0.886
0.884 1 0.9
0.882
0.88 0.85
0.878
1 0.876 2 0.8
0.874
0.872 0.75
3

0.878
0.876
0.874
0.872
0.888
0.886
0.884
0.882
0.892
0.88
0.89 2 0.75
0.85
0.95
0.8
0.9
1 4
0 1 2 0 1 2 3 4

(c) (d)
64-pixel sensor 256-pixel sensor
0.9 0.9
0.8 0.8
0.7 0.7
0 1 0 1
1 0.95
0.95
1 2
0.9 3 0.9
2 4 0.85
0.85
5 0.8
3 0.8 6
0.75
0.75 7
4 8 0.7
0.7 9 0.65
5 0.65 10 0.6
11
6 12
13
7 14
15
0.65
0.75
0.85
0.95
0.7
0.8
0.9
1 8 0.65
0.75
0.85
0.95
0.6
0.7
0.8
0.9
1 16
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(4 and 64 pixel) (16 and 256 pixel)

Figure 2: Pixel Current 2

• in general expectation value of any power of Ô should be as close as possible by


using true and approximate pdfs.

To find an approximate representation of a true pdf we need design various strate-


gies and methodologies. Forgoing discussions on the requirements for approximate
pdf suggests that if we have smaller set of data then we must ensure that it represents
the entire probability space ( a set of all possible outcomes).
Here we will take a mathematical digression and discuss how to approximate true
pdfs. We were talking about comparing of expectation value of any power of Ô for
true and approximate pdfs. These expectations are known as moments of the pdf
when observable Ô is the random number itself. Below we will try to understand
why an approximate pdf is charectrized by the matching the moments of collected
data with the assumed (or thought) pdfs and then comparing them with actual pdf.
Idealy, any well behaved function f (x) can be expanded in terms of another well
behaved function g(x). By well behaved funcion in our context, we mean to say that
all these function should be infinitely differentiable and continuous. We will not go
into actual mathematical definition of such jargons, except noting that n−th order

9
64-pixel sensor
0.9
0.8
0.7
0 1
0.95
1
0.9
2
0.85
3 0.8
0.75
4
0.7
5 0.65
6

0.65
0.75
0.85
0.95
0.7
0.8
0.9
1 8
0 1 2 3 4 5 6 7 8

Figure 3: Pixel Current 3

differential of a function f (x) with respect to x is follows a recursive relation as

dn d dn−1
f (x) = f (x) (3)
dxn dx dxn−1
d
where dx
f (x) is defined as

d f (x + δx) − f (x)
f (x) = limδx→0 . (4)
dx δx
d n
Our requirement is that all these dx n f (x) should not yield infinity as the limit is

achieved. Thus if the pdfs we have chosen is well defined, then it is possible to
expand it in terms of underlying variable. i.e. f (x) can be expanded in terms of x
around a given x = a as follows

d d2 (x − a)2 d3 (x − a)3
f (x) = f (a)+ f (x) |x=a (x−a)+ 2 f (x) |x=a + 3 f (x) |x=a +· · ·
dx dx 2! dx 3!
(5)
Since we have assumed that the function is well behaved and it is differentiable
dn
upto infinite order, the terms dx n f (x) turns out to be a number. Therefore we will

get a power series like

f (x) = a0 + a1 x + a2 ∗ x2 + · · · . (6)
Now assume true pdf given by g(x) is expanded as

g(x) = b0 + b1 x + b2 ∗ x2 + · · · . (7)

10
(a) (b)
comparision of line chart and histogram comparision of line chart and histogram
0.1 0.1
’histc.txt’ ’histc.txt’
’linec.txt’ u 1:2 ’linec.txt’ u 1:2

count/size

count/size
0 0
-2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
x x

(c) (d)
comparision of line chart and histogram comparision of line chart and histogram
0.2 0.2
’histc.txt’ ’histc.txt’
’linec.txt’ u 1:2 ’linec.txt’ u 1:2
count/size

count/size
0.1 0.1

0 0
-5 -4 -3 -2 -1 0 1 2 3 4 -5 -4 -3 -2 -1 0 1 2 3 4
x x

(50 and 500 points) (100 and 2000 points)

Figure 4: Histogram and Line chart 4

and the norm of the difference



|g(x) − f (x)| = b0 − a0 + (b1 − a1 )x + (b2 − a2 ) ∗ x2 + · · · .
X
≤ (bi − ai )xi . (8)

i
(9)

If the norm of the difference is independent of x or atleast remains as small as


possible as x → ∞ we require that (bi − ai )xi <  where  is the number greater than
zero and as small as possible. In that case we say that two functions f (x) and g(x) is
approximately same. In practice, it is difficult to obtain the comparision of terms for
all orders, but general experience and some of the standard studies on other properties
(for example asymptotics) of the pdfs gives us reasonable and well acceptable answers.
For example, while studying the distribution of extreme events we study the tail part
of the pdfs. We do not have data for the central region, therefore comparision is not
strictly done as we defined above. But for the reason of interest, people follow the
route exactly as we have discussed.

11
In the discussion made above we established that we can approximate the true pdf
of outcomes of random processes without waiting for infinite time. A smaller set of
data which satisfies certain conditions is used to construct pdfs. Here we will discuss,
how we do it numericaly. The process is defined below.

1. we collect the data, let us represent them by xi .

2. we define average of N such xi s by


N
1 X
xi .
N i=0

This is consistent with our definition of expectation value of xi . i.e. if ci is the


count associated with random number xi and assume there are M < N unique
xi s. Then
N
X M
X
xi = cj xj .
i=0 j=1

Divideing this by N and recognising that probability of xi is defined as pi =


ci /N , we can rewrite the expectation value of xi as
X
pj xj
j

where xj are unique entries.

3. In similar fashion we can compute expectation value of any power of xi .

The pi defined can be understood as f (x = xi ) we discussed earlier. Now it comes


to our expertize to decide what should be equivalent of g(x). We do not know what to
do now, therefore, we start by looking at various well known, less known or unknown
pdfs and computing similar expectation values. Based on our earlier discussions, if
moments matches upto certain order then we say that we have approximated the true
pdf. Sometime, experience of other people comes to our help, and we can easily guess
the pdfs, for example log-normal or normal.
Now let us narrow down our thoughts and discussion and consider doing some
mathematics. This will be slightly heaviour than the normal, therefore I will keep the
discussion intutive and keep on comparing with what we have already discussed. We
will define a function known as Dirac’s delta function δ(x − a) having the property
Z ∞
h(x)δ(x − a) = h(a). (10)
−∞

Intutively δ(x − a) can be thought as the hieght of the line in line chart and f (a) can
be thought as the hight of the histogram. Assume h(x) = 1 then
Z ∞
δ(x − a) = 1. (11)
−∞

12
in this way we can also say that δ(x − a) behaves as the probability distribution
function. There are various representation of Dirac’s delta function, one of which
closely related to our discussion is
δ(x − a) = 1. f or x = a
= 0 otherwise (12)
Thus histogram is represented as
X
δ(x − ai ), ai ∈ (z, z + dz) = count(x; such that x ∈ (z, z + dz))
It is this observation which is useful for the purpose of this note. Remember, while
drawing a line chart or a histogram we first find the count(x; such that x ∈ (z, z+dz)),
where both z and x are from same set. To clarify this statement we write the algorithm
use to find the count.
for every x in the set X
{
define count for x = 0
for given point z from the set X
{
define the region z and z+dz
if (x <= z AND x<z+dz) then
increase count by 1.
}
output count for x
}
It is now just a matter of choice whether we wish to associate this count to one
special point of the region (z, z + dz) creating line chart. Or we wish to associate
it with all points in the region 9z, z + dz) to create histogram. Ultimately we need
to normalize the histogram to represent it as pdf. Thus the height of the line chart
cannot be compared with the hight of histogram in raw format. However, scaled
hights will come closer to each other.
Here we wish to introduce another method of computing the count(x; such that x ∈
(z, z + dz)) by recognizing that we can also represent it in terms of Dirac delta func-
tion. That is we are now interested in computing
Z ∞
p(x) = δ(x − a)da.
a=−∞

We approximate delta function by following way


1 1
 
limτ →0 =P − iπδ(x − a) (13)
x − a + iτ x−a
1
where P () denotes the principle value of x−a
. Thus delta function is computed as
1 1
 
δ(x − a) = − Imag limτ →0 . (14)
π x − a + iτ

13
When we integrate over a we get probability of x. This is equivalent of saying that
the line at each unique random variable is now replaced by a lorentzian (x−a)τ2 +τ 2 . In
histogram it is replaced by a rectangle of hight 1 in certain unit, and in line chart it
is replaced by a line. This method combines all of what we know about histogram or
the line chart and introduces another way of looking at pdfs. Since, representation
of count is directly related to the probability, we get hard boundaries for histograms
and line chart. But for lorentzian way of computing pdf does systematic transfer of
weights to the tail. One artifact it creates is the choice of value of τ , but we can use
this parameter to control the tail distribution, therby getting extra control. Below
we compare our thought experimetal result and show how three different methods
perform for a set of random numbers generated from known pdf. i.e. true pdf is
known, we simulate the random number and compute moments using three methods
and then compare them with known answers.

14

You might also like