Professional Documents
Culture Documents
∑
(kernel smoothed density function) fˆ(x) = ni=1 n1 kxi (x)
∑n 1
(kernel smoothed distribution function) F̂ (x) = i=1 n Kxi (x)
This is indeed a density function. The corresponding distribution function is called kernel
smoothed distribution.
Two most commonly used kernels are the uniform kernel and the triangular kernel.
1
1. Uniform Kernel
A uniform kernel is a uniform distribution over some interval [x∗ − b , x∗ + b]. The number b is
called the bandwidth. Its density function is:
1
x ∈ [x∗ − b , x∗ + b]
2b
kx∗ (x) =
0 otherwise
2
1 x∗ ≤ x − b
the right end-point−x∗
Kx∗ (x) = x − b ≤ x∗ ≤ x + b
length of the interval
0 x + b ≤ x∗
600 , 615 , 618 , 620 , 637 , 637 , 645 , 675 , 685 , 690
Use the smoothed kernel with uniform kernel of bandwidth 15 to estimate the distribution.
Then what are the estimates fˆ(635) and F̂ (635) ?.
Solution.
1 1
The kernel density function has the value 2b = 30 . There are 4 points in the interval
[635 − 15 , 635 + 15] = [620 , 650]. So
To calculate F̂ (635) : we locate the interval [620 , 650] on the real line and then put the data
points on the real line. The points 600, 615, and 618 are below the interval , so each get the
value 1. The four points that are inside the interval get these values:
600 ⇒ 1
615 ⇒ 1
618 ⇒ 1 151 1 151
⇒ Total = ⇒ Multiply by ⇒ F̂ (635) = = 0.5033
620 ⇒ 30 30 10 300
30
637 ⇒ 2( 30
13
)
645 ⇒ 5
30
3
The graph of any uniform-kernel-smoothed distribution function is:
1
slope = nb
Note. If the observed values {x1 , ... , xn } have different probabilities p(xi ) than 1
n then the
kernel-smoothed density is
∑
n
fˆ(x) = p(xi ) kxi (x)
i=1
Example (exercise 12.30 of the textbook). You are given the following ages at time of
death for 10 individuals: individuals
25 , 30 , 35 , 35 , 37 , 39 , 45 , 47 , 49 , 55
Using a uniform kernel with a bandwidth of b = 10, determine the kernel density estimate of
4
the probability of survival to age 40.
Solution. We need to find Ŝ(40). For this we calculate F̂ (40) first. The kernel density
function has the value 1
2b = 1
20 . There are 8 points in the interval [40 − 10 , 40 + 10] = [30 , 50].
We then locate the interval [30 , 50] on the real line and then put all data points on the real
line. The points 25 of the data set is below this interval , so it gets the value 1. The seven
points that are below or inside the interval get these values:
5
25 ⇒
1
30 ⇒ 20
20
35 ⇒ 2( 15
20 )
37 ⇒ 13
20 103 1 103
⇒ Total = ⇒ Multiply by ⇒ F̂ (40) = = 0.515
39 ⇒ 11 20 10 200
20
45 ⇒
5
20
47 ⇒ 3
20
49 ⇒ 1
20
Note. This data set could have been given in the following format:
tj sj rj
25 1 10
30 1 9
35 2 8
37 1 6
39 1 5
45 1 4
47 1 3
49 1 2
55 1 1
Example (exercise 12.29 of the textbook) ∗. You are given the data in Table below on
time to death. Using the uniform kernel with a bandwidth of 60, determine fˆ(100).
6
tj sj rj
10 1 20
34 1 19
47 1 18
75 1 17
156 1 16
171 1 15
Solution. This data is complete for the part given. Each event has the probability of
1
occurrence equal to 20 .
1 1
The kernel density function has the value 2b = 120 . There are 3 points in the interval
[100 − 60 , 100 + 60] = [40 , 160]. So
is nothing but the P (X ≤ 160) − P (X < 40) = F (160) − F (40− ). In some examples the
underlying distribution might be different from the empirical distribution in which case we use
the appropriate F to calculate a difference like F (160) − F (40− ) ; see the next example.
Example. In the previous example suppose that the Kaplan-Meier estimation of S(t) is used
(to calculate the probabilities). Answer the same question.
Solution.
We have:
7
P (40 ≤ X ≤ 160) = F (X ≤ 160) − F (X < 40) = F (X ≤ 156) − F (X ≤ 34) = F (156) − F (34)
( )( ) ( )( ) ( )
19 18 19 18 15 18 15 3
= S(34) − S(156) = − ··· = − = = 0.15
20 19 20 19 16 20 20 20
Then:
( )
1 1
fˆ(100) = P (40 ≤ X ≤ 160) = (0.15) = 0.0013 ✓
120 120
Note. In the professional exam when there is no mention of probabilities, then you should use
1
the empirical probabilities, which are all equal to n.
8
2. Triangular Kernel
Example (from the Finan’s study guide). You are given the following ages at time of
death of 10 individuals:
25 , 30 , 35 , 35 , 37 , 39 , 45 , 47 , 49 , 55
Using a triangular kernel with bandwidth 10, Find the kernel smoothed density estimate
fˆ(40).
Solution. The base of triangle based at x = 40 is the interval [40 − 10 , 40 + 10] = [30 , 50]. So:
10−|x−40|
x ∈ [30 , 50]
100
k40 (x) =
0 otherwise
From the data set, only the following values fall in the interval [30 , 50]:
30 , 35 , 35 , 37 , 39 , 45 , 47 , 49
These eight points that are inside the interval get these values:
9
30 ⇒ 0
35 ⇒ 2( 100
5
)
37 ⇒ 7
100
35 1
39 ⇒ 9 ⇒ Total = ⇒ Multiply by ⇒ fˆ(42) = 0.035 ✓
100 100 10
45 ⇒ 5
100
47 ⇒ 3
100
49 ⇒ 1
100
Equivalently:
1 x∗ ≤ x − b
(b−|x−x∗ |)2
1− x∗ ∈ [x − b , x]
2b2
Kx∗ (x) =
(b−|x−x∗ |)2
x ∈ [x , x + b]
2b2
0 x + b ≤ x∗
Example . In the previous example, find the triangular kernel smoothed density estimate
F̂ (40).
10
Solution. The base of triangle based at x = 40 is the interval [30 , 50]. So:
1 x∗ ≤ 30
(10−|40−x∗ |)2
1− x∗ ∈ [30 , 40]
200
Kx∗ (40) =
(10−|40−x∗ |)2
x ∈ [40 , 50]
200
0 50 ≤ x∗
From the data set, only the following values fall in the interval [30 , 50] or are on the
right-hand side of this interval:
30 , 35 , 35 , 37 , 39 , 45 , 47 , 49 , 55
1 x∗ ≤ 30
(10−|40−x∗ |)2
1− x∗ ∈ [30 , 40]
200
Kx∗ (40) =
(10−|40−x∗ |)2
x ∈ [40 , 50]
200
0 50 ≤ x∗
11
30 ⇒ 1
35 ⇒ 2(1 − 20025
)
37 ⇒ 1 − 20049
855 1
39 ⇒ 1 − 20081 ⇒ Total = ⇒ Multiply by ⇒ F̂ (40) = 0.4275 ✓
200 10
45 ⇒ 25
200
47 ⇒ 9
200
49 ⇒ 1
200
12
3. Calculating Mean and Variance of Kernel-Smoothed
Distributions
11 , 16 , 19 , 21
Suppose that we smooth the data with a uniform bandwidth of 2. Calculate the mean and
variance of the smoothed distribution.
Solution . Let X denote the smoothed random variable, and let Y be the discrete random
1
variable having the empirical probability 4 assigned to the observations. The conditional
random variable (X | Y = 11) is just a uniform distribution with center of it being 11,
therefore (X | Y = 11) = 11. This argument holds for all other three observations as well,
therefore we have E(X|Y ) = Y . Therefore:
11 + 16 + 19 + 21
E(X) = E[E(X|Y )] = E(Y ) = = 16.57 ✓
4
(this equality E(X) = E(Y ) holds for all kernels not just for the uniform kernel, because of
symmetry)
42 4
Var(X|Y = 11) = Var(uniform on an interval of length 4) = =
12 3
And this holds equally well for all other observations. So the conditional random variable
13
Var(X|Y ) is the constant 43 , therefore its mean is also 43 . So, E[Var(X|Y )] = 43 . On the other
hand,
So then:
4
Var(X) = Var[E(X|Y )] + E[Var(X|Y )] = 20.1851 + = 21.5184 ✓
3
Second method (a longer method). We may use the density function of the smoothed
density to calculate the mean and variance of X. The smoothed density function is this step
function:
x (−∞ , 9) (9 , 13) (13 , 14) (14 , 17) (17 , 18) (18 , 19) (19 , 21) (21 , 23) (23 , ∞)
1 1 2 1 2 1
f (x) 0 16 0 16 16 16 16 16 0
{[ ]13 [ ]17 [ ] [ ] }
1 1 2 1 [ 2 ]18 1 2 19 [ 2 ]21 1 2 23
= x + x2 + x 17 + x + x 19 + x =?
16 2 9 2 14 2 18 2 21
{∫ 13 ∫ 17 ∫ 18 ∫ 19 ∫ 21 ∫ 23 }
2 1 2 2 2 2 2 2
E(X ) = x dx + x dx + 2x dx + x dx + 2x dx + x dx
16 9 14 17 18 19 21
14