You are on page 1of 12

A Mixed Poisson-Inverse-Gaussian Regression Model

Author(s): C. Dean, J. F. Lawless and G. E. Willmot


Source: The Canadian Journal of Statistics / La Revue Canadienne de Statistique, Vol. 17, No. 2 (
Jun., 1989), pp. 171-181
Published by: Statistical Society of Canada
Stable URL: http://www.jstor.org/stable/3314846
Accessed: 17-08-2015 16:02 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Statistical Society of Canada is collaborating with JSTOR to digitize, preserve and extend access to The Canadian Journal of
Statistics / La Revue Canadienne de Statistique.

http://www.jstor.org

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

The Canadian Journal of Statistics

Vol. 17, No. 2, 1989,Pages171-181

171

La Revue Canadienne de Statistique

mixed

regression

Poisson-inverse-Gaussia
model*

C. DEAN,J.F. LAWLESS,AND G.E. WILLMOT


Key words and phrases: regressionanalysis of counts, extra-Poissonvariation,maximum
likelihood, quasilikelihood,quadraticestimatingequations.
AMS 1985 subject classifications: 62J02.

ABSTRACT
distribution
hasbeenusedby Holla,Sankaran,
ThemixedPoisson-inverse-Gaussian
Sichel,and
othersin univariate
regression
problemsinvolvingcounts.Weproposea Poisson-inverse-Gaussian
model which can be used for regressionanalysisof counts.The model providesan attractive
randomeffectsin Poissonregressionmodelsandin handlingextraframework
for incorporating
and
estimationis investigated
andquasilikelihood-moment
Poissonvariation.Maximum-likelihood
claims.
illustratedwithan exampleinvolvingmotor-insurance

RESUME
Un melangeponder6de lois de Poisson,avecdes poidssuivantuneloi gaussienneinverse,a 6t6
Sichelet d'autrescommemodeleunidimensionnel
dansdes problemes
utilis6parHolla,Sankaran,
Celui-cipeut
Nousproposonsunmodelede regressionbasesurun tel m61ange.
de d6nombrement.
de
faites
de
dans
des
II
analyses
regression
partir
permet
d'incorporer
ta
d6nombrements.
&treutilis6
des effetsal6atoiresdansunmodelede regressionbasesurla loi de Poisson,ainsiquele traitement
de la variationnonrepr6sent6e
parla loi de Poisson.L'estimation
parla m6thodedu maximumde
et parla quasi-vraisemblance/moments
est 6tudi6eet illustr6et l'aide de donn6es
vraisemblance
relativesi l'assuranceautomobile.
au sujetde reclamations

1. INTRODUCTION
Mixed Poisson distributionshave been found useful in situationswhere counts display
extra-Poissonvariation.Applicationsof univariatemodels aboundin areas such as insurance (e.g., Willmot 1987) and biology (e.g., Anscombe 1950), where specific models such
as the negative-binomialand Poisson-inverse-Gaussianare widely used. Mixed Poisson
regression models have been employed in areas such as demography (e.g., Brillinger
1986), medicine (e.g., Breslow 1984), and engineering (e.g., Engel 1984). With regression data, statistical analysis is often based on weighted least-squaresor quasilikelihood
methods, but fully parametricmodels such as the negative-binomial(e.g., Engel 1984,
Lawless 1987) or Poisson-Normalmixture (e.g., Hinde 1982) are also used, particularly
when more than the first two moments are of interest.
In some applications,for example in insurance,it is useful to fit a specific probability
distributionto the data. A factor inhibitingthe use of fully parametricmethods somewhat
is that inference for many mixed Poisson regressionmodels requiresthe use of numerical
integration(e.g., Brillinger 1986, Section 6), negative-binomialmodels being a notable
exception. For univariatedata, the Poisson-inverse-Gaussianmixture has been shown to
*Researchwas supportedin part by grantsto J.F. Lawless and G.E. Willmot from the NaturalSciences and
Engineering Research Council of Canada

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

172

DEAN,LAWLESS,AND WILLMOT

Vol. 17, No. 2

be an attractiveand easily used model (e.g., Holla 1966, Sankaran1968, Sichel 1971,
Ord and Whitmore 1986, Willmot 1987). In this paper we consider a regression form of
the model, and show that statistical methods for it are straightforwardcomputationally.
This provides an attractivealternativeto negative-binomialmodels when a longer-tailed
distributionis required.
Section 2 introducesthe model, and Section 3 develops maximum-likelihoodestimation. Section 4 examines the efficiency of quasilikelihood-momentmethods relative to
maximum likelihood. Section 5 contains an example involving motor-insuranceclaims.

2. A POISSON-INVERSE-GAUSSIAN
REGRESSIONMODEL
Let Y be a response variable,and let x be an associated k x 1 vector of covariates.A
Poisson regressionmodel for Y would stipulatethat,given x, Y had a Poisson distribution
with mean gt(x). There are various ways to introduce random effects or extra-Poisson
variationinto such a model. The approachwe discuss below is a very naturaland flexible
one which has been used by numerous other authors such as Brillinger (1986), Engel
(1984), Hinde (1982), and Lawless (1987). We note, however, that other models also
have appeal, and we discuss this momentarily.
We consider mixed Poisson regression models with
Pr(Y = y Ix) =

e-

) [v()]g(v)

dv,

y = 0, 1, ...,

(2. 1)

where g(v) is a probabilitydensity function and L(x) is a positive-valuedfunction. Such


models can be viewed as multiplicativePoisson random-effectsmodels (e.g., Brillinger
1986) where, given the fixed covariatesx and a randomeffect v with density g(v), v > 0,
the response Y has a Poisson distributionwith mean vL(x). We assume that L(x) depends
on a vector 0 of unknown regression coefficients and, without loss of generality, that
E (v) = 1. This parametrizationhas the appealing property that when Lt(x)takes the
common log-linear form I(x) = exp(x'p), random and fixed effects are added on the
same (exponential) scale.
It is well known that the assumptionof a gamma distributionfor v leads to a negative
binomial model. In this paper, we assume an inverse Gaussian distribution(e.g., Folks
and Chhikara1978, Tweedie 1957) for v, with density
g(v) = (27tv3)-e-V-1)2/2w,

v > 0.

(2.2)

The parametert is unknown,and equals Var (v). The distributionof Y given x resulting
from (2.1) is then a Poisson-inverse-Gaussian(P-IG) regression model, with mean and
and t(x) +
variance functions
respectively. For convenience we will write
tit(x)2,It
Y -, P-IG (p(x), t) t(x)
to denote this model.
provides a heavier-tailedalternativeto the
model
and
is
than other Poisson mixtures (2.1).
more
tractable
negative-binomial
A simple extension of the model is also often useful. Suppose that given the random
effect vi, Yi is Poisson with mean vi.(xi; p)Ti, where Ti is a known measureof exposure.
In some situations vi and, in particular,its variance might depend on Ti. For example,
if Yi is obtained as an aggregate count by adding across counts with a common xi but
differentexposures, then a plausible model might be to take Yar(vi) = t/Ti. The model
(2.1) arising from this can be fitted with only slight alterations to the procedures in
Section 3 and is discussed in the example of Section 5.
We note that Jorgensen(1987) and Stein and Juritz(1988) consider other P-IG regression models. Each has attractivefeatures:Jorgensen'sis a discrete exponentialdispersion
This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

REGRESSION
POISSON-IG

1989

173

model and satisfies an appealingconvolutionproperty,and Stein and Juritz'sis structured


so that the regressionparametersp are orthogonalto the parameter(analogous to our t)
specifying the degree of extra-Poissonvariation.Neither model has, however, the simple
structureof those we consider in terms of the multiplicativerandomeffects.
We note for use in the next section that for P-IG(I, t) the probability generating
function is (e.g., Holla 1966, Willmot 1987)
00

- {1 -

= exp(r'[1

P(z) = Zp(y)zy

2"t(z

y=o

- 1)}1/2]),

(2.3)

where we have writtenp(y) for Pr(Y = y). Probabilitiesmay be calculated recursively


using the easily established results
p(O) =exp[t-' {1 - (1 + 2t)?
}1],
p(l) =gL(1+ 2-tY)-'p(O),
P(Y) -

p(y - 1 + I

1- -2y

2Ty

21g

yOy

p(y

1)

- 2),

y =2,3, ....

(2.4)

3. MAXIMUM-LIKELIHOOD
METHODS
The log-likelihood functionfrom a randomsampleof observations(Yi,xi), i =
1,...., n,
=
is 1(3, t) =E
, log pi(yi), wherepi(yi) standsfor Pr(Y = yi I xi;P, ). Forconvenience
we write Li = L(xi;P) and define, for i = 1,..., n,
+ 1)

Pi(Y
=
(3.1)
ti(y) = (y+ 1) (Y
y 0, 1,2, ....
Pi(Y)
Manipulation of (2.3), (2.4), and (3.1) shows that l(p3, ) and its first and second
derivativescan be expressed as
n

l(43,)

i= 1

ni=

---

logti(j) ,

Yij=0

Ul
Uk+

yi-1

+logpi(O)
+I(yi> 0)

log
(--

1+

j1
Cgi

1-

ti(Yi)

+ 'yi)

(3.3)

'2 ,

i
l

Irs

+ 1) +

{y

t(yi)tibi

+{ti(yi)- y} i

tii)2

)},

r,s = 1,...,k,

(3.4)

9i2n

r,k+1

Or
+ +O)ti,

1)

({ti(bi
r =1 ...

k,

tiYi)}l

=
-k+1,k+1

Ik~l'k~l=

di=1
(1 + i)2
z1

'2

-_2(

ti)

,
(3.5)

ti

i)

+3
"3i2'-

,
ti(yi){ti(yi + 1)-

ti(Yi)})

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

(3.6)

174

ANDWILLMOT
DEAN,LAWLESS,

Vol.17, No. 2

We remark that (3.2) and (3.4) hold quite generally for mixed Poisson models of the
form (2.1); Sprott (1965) notes this in the univariatecase.
It is possible to have the maximum-likelihoodestimate T = 0, implying that a Poisson
regressionmodel is best supportedby the data. To avoid problemswhen Tis zero or close
to zero, we find it convenient to maximize l(3, t) for selected t-values by solving the
likelihood equationsu,r(, t) = 0 (r = 1, ..., k) to obtain0 (t). The profile log-likelihood
I
1(0(t), t) is then easily obtainedand maximizedwith respectto t, to yield andI = I (i).
Estimates 0I(t) are readily found via Newton-Raphson iteration or, alternatively,the
scoring algorithm.With regard to the latter and to efficiency calculations in Section 4,
we note that the Fisher informationmatrix entries are found after some algebra to be
Ars(0,')=E

Ar,k+1(,T) -

i=

arni=

-a21

,--'

1
+

1+

vVi,L) 2 i
T2

Tgi

9 (3.7)
1,....k,

i?

(3.8)

_i
r

r = 1,
...., k,

)=

'

9I_-

ap,

rs

ap,

Or,

-a2
2&E

li

I(3.9)

Ak+12,k+l(9,

i=l

(1 I+

i)2
2

where Vi = gi - g(11 + ) + E [ti(Yi)2].

For the widely used log-linear specification gi = exp(x'p), the formulae (3.2)-(3.9)

simplify to some extent; note that a4i/B,

= Xirgi in this case. We remark also that

the values ti(yi) can be computed from the following recursions, which are a direct
consequence of (2.4):
ti(O) =

ti (y)=

ti(1 + 21;ti)-,

(2y

1)
i

ti(O)2,

y= 1, 2,...

ti(y
t (y)- _"
--1)
helps avoid roundoffproblems.
Workingwith the ti(y)'s ratherthan the pi(y)'s
Confidence intervals and tests about parameterscan be obtained by using familiar
asymptoticX2 approximationsfor likelihood-ratiostatistics or by treating(9 - T,~ - t)
as approximatelynormallydistributedwith mean 0 and covariancematrixeither I(0 , )-1
or A((, )-'. We have not studied whether one covariance-matrixestimate might be
preferable to the other, but the observed is more easily computed and we have used
it in Section 5. When t > 0, limiting distributionswhich yield these approximations
arise under mild conditions as n --+ 00 and also for fixed n and t as the li's -+ c0.
Likelihood-ratioand normal-approximationconfidence intervals for 0 generally appear
to be in good agreement and satisfactory for practical purposes, provided that the li's
and n are not both small; the likelihood-ratiomethod is preferablewhen the two disagree.
The same is true for inferences about t, with the additionalproviso that when t is close
to zero, intervals by either approachare inaccurate.Based on limited empirical results,
we suggest as a rough practical guideline that when a 0.95 or 0.99 confidence interval
for t includes zero, one should expect that the right limit of the intervalis somewhat too
small.
We remarkthatresults of Stein, Zucchini,andJuritz(1987) or Willmot (1988), showing
for P-IG(i, t) that t and V-2+ 2r'-1 are orthogonalparameters,imply that [3 and i will
have low asymptotic correlationwhen Rirtvalues are small. This is often the case and

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

1989

POISSON-IGREGRESSION

175

has been reflected in the estimated covariance matrices I(3, i)-' or A (0, i)-' in data
sets we have examined.
Finally, tests of the hypothesis t = 0 are often of interest, since this correspondsto a
Poisson model. A test may be based on the likelihood-ratiostatistic

A = 21(, T)- 21(0(0),


0).
Under H : = 0 this has an asymptotic distributionwith a probabilitymass of 0.5 at
?
A = 0, and
half-X2) distributionfor A > 0 (Chemoff 1954). When n and the gLi'sare
both small, this limiting distributionis a poor approximationto the actual distributionof
A. Dean and Lawless (1989a) discuss other approacheswhich can be used then.

4. QUASILIKELIHOOD-MOMENT
ESTIMATION
Weighted least squares, or quasilikelihood, is often used for the regression analysis
of count data. Such methods are popular because they involve familiar iterative leastsquares calculations and can be carried out with readily available software. They also
possess a degree of robustness to misspecification of the distributionof v in (2.1). We
examine these methods briefly here to see whether they are efficient when the model
(2.1) is indeed P-IG(gt(xi),1).
The quasilikelihood equations for 0 (e.g., McCullagh and Nelder 1983) are
i

Si

=-

0,

k,
1
- .1....

(4.

1)

where = Var (Yi) = i + T,r2. An additionalequationis needed to allow estimation of

t; one that is often used is


(Y

i)2

i=1

- (n - k) = 0

(4.2)

(e.g., Breslow 1984, McCullaghand Nelder 1983). Dean (1988) shows that it is preferable
to use (4.1) combined with the equation
n

(yi

Yi

,i)2 _

(1 + g)2-

(2

=0

(4.3)

1rpi)2
to estimate 0 and t. This is motivated
i=1
by a study of quadraticestimating equations
(Crowder 1987, Godambe and Thompson 1989) for this problem. The equations may be
solved conveniently by first fitting the Poisson model (T = 0) to obtain initial estimates0
and ji (i =
and then insertingthese in (4.3) and solving for t. If the solution t
1,...., n),
is positive, the process is repeatedusing t in (4.1) to obtain a new j, and so on, iterating
until convergence. In some cases it may be that Z = 0, representinga Poisson model.
The asymptotic covariancematrix of the estimator(1, t) given by the solution to (4.1)
with (4.3) can be obtained using general results on estimating equations (e.g., Inagaki
1973) or on quadraticestimating equations (Crowder 1987). The limiting distributionof
( - 1, t - t) is normal with covariancematrix of the form

1F-'1

(c - b)'F-1
b-+l

bk+1F-'(c

- b)

+
(Ck+l 2c'F-lb b'F-lb)

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

(4. 4)

ANDWILLMOT
DEAN,LAWLESS,

176

Vol.17, No. 2

TABLE
1: Asymptoticrelativeefficienciesof quasi
likelihoodmomentestimationin a Poisson-inverseGaussianregressionmodel.
Efficiency

rovs. o

0
0.005
0.01
0.05
0.10
0.20
0.50

1.000
1.000
1.000
1.000
1.000
1.000
1.000

vs.P,

vs.
1.000
1.000
1.000
1.000
0.961
0.860
0.641

1.000
1.000
1.000
0.999
0.995
0.978
0.914

where Fkxk has (r, s) entry


Frs = lim

n--+oo0n
i=1

Ps

and b = (bi,..., bk)', C = (Cl,. .. Ck),bk+l, and Ck+1are given by


= lim
br
bk+l=

S1n

n1 ,
i= I
n
n--oo

RI + 2Tgi) t ai

...2,

r =1,
r

lim -

n--+oo n

lim n
n--+oo
l
Cr=
1
= lim
Ck+l1 n--oo -E
n

i=l
i=
3I, a
r
i1
1,6.6li29k,

"r '
+
(2i
2)g4

(y

i=1

3+8t3
+
+ r2 }/ao and Y2i= 7T+ {1 +
where
= {(1 +
4t22
2+Cti)o2
yli
i}/{ti(1
"tji)2}
3i
for
n
Variance
estimates
for
finite
are skewness and kurtosis coefficients
P-IG(Li,t).
in
into
with
all
are obtained by inserting parameterestimates
(4.4)
"limno" omitted
expressions. It can be shown, as in Lawless (1987), that the estimator01and its estimated
varianceare consistentprovidedthe model is mixed Poisson, and consequentlyare robust
to departureof the mixing distributionfrom inverse-Gaussianform.
An example in Section 5 compares the estimates 0, t based on (4.1) and (4.3)
with maximum-likelihood estimates 0, t for a set of data. We have also examined
the asymptotic efficiency of (0j, ) relative to (5, I~)for a variety of regression situations
= exp(Po + Pixi),
as follows: (a) k = 1, g = 10; (b) k = 1, g = 40; (c) k = 2,
1i
=
=
= 10,
of the xi's each of -1, -0.5,0,0.5, 1; (d) k =
1,
PI
exp(o)2,i=
of
of
+
each
the
same
as
1,
10,
-1,0,
1;
=
(e)
(d)
exp(jo)
xi's
exp(Po Plxi), 1
except that Pi = 0.5, exp(jo) = 10; (f) same as (d) except P1I= 0.5, exp(Po) = 50.
As an illustration, Table 1 shows results for case (e) (the gLi-valuescorrespondingto

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

POISSON-IGREGRESSION

1989

177

TABLE 2: Classificatoryfactors for Swedish motor-insurancedata.

DISTANCEKilometerstravelled (5 classes)
(1) less than 1000 km per year
(2) 1000-15,000 km per year
(3) 15,000-20,000 km per year
(4) 20,000-25,000 km per year
(5) more than 25,000 km per year
BONUS No-claims bonus (7 classes)
Insured starts in the class BONUS= 1 and is moved
up one class (to a maximumof 7) each year there is no
claim.
MAKE 9 specified car makes.

x = -1,0, 1 are then approximately6.1, 10, and 16.5). Severalvalues of t are considered,
and the table shows asymptoticrelative efficiencies for ~o, P1, and t, defined as the ratio
of the asymptotic variance for the maximum-likelihoodestimatorfor each parameterto
the asymptotic variancefor the quasilikelihood-momentestimator.The results shown in
the table illustratefeatures found in all of the situations.
For the regression situationsthe efficiencies of fi were all largerthan 0.9 for t < 0. 5.
With moderate amounts of extra-Poisson variation (i.e. t not too large) the efficiency
of t is very high. As r increases, the efficiency of drops off faster than that of 0.
In practice, however, relevant values of r tend to be fairly small; unless the gi's are
very small, estimates of r are usually under 0.50. Efficiencies tend to be slightly higher
when the regression effect is smaller or when there is less variationin the xi's (when
li =... ==ln = e , 0 is asymptoticallyfully efficient).
If one is certain about the appropriatenessof the P-IG model, then the maximumlikelihood estimation of Section 2 is of course preferred. However, the estimation
procedure embodied in (4.1) and (4.3) is a simple practical alternative, and can also
be used to get startingvalues for maximum- likelihood iteration.In addition, it is more
robust than maximum likelihood in providing consistent estimates of 0 and r when li
and
are correctly specified, even though the distributionof Y may not be Poissono
inverse-Gaussian.
Finally, it is comforting to know that this general estimation method,
which is used by many statisticians,has quite high efficiency for the P-IG model. Dean
and Lawless (1989b) show that estimates from (4.1) and (4.3) also have high efficiency
under a negative-binomialmodel.

5. AN EXAMPLE
Andrews and Herzberg(1985, Table 68.1) present data on Swedish third-partymotor
insurance in 1977 for Stockholm, G6teburg, and Malmo, obtained from a committee
study of risk premiumsin motor insurance.Three factors are thoughtto be importantin
modeling the occurrence of claims; these are listed in Table 2. The data give the total
numberof claims Y for automobiles insured in each of the 315 risk groups defined by a
combinationof DISTANCE,
and MAKE
factor levels. For each group there is also an
BONUS,
which
is
the
number
of
insured
"exposure"T,
automobile-yearsfor that group, in units
of 105.
We investigatedthe fit of Poisson and mixed Poisson log-linear models to these claims
This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

DEAN,LAWLESS,AND WILLMOT

178
TABLE

1.
2.
3.
4.
5.
6.

Vol. 17, No. 2

3: Fits of several models to the Swedish motor-insurancedata.


Model

PearsonStatistica

d.f.

Poisson (main effects)


Poisson (main effects; observations174,
180, 183 deleted)
= 9 deleted)
Poisson(maineffects, MAKE
= 9 and
Poisson (main effects, MAKE
observations8, 174, 183, 184 deleted)
P-IG(Ri,7) (main effects)
P-IG(Ri,7/Ti) (main effects)

485.1

296

355.0
361.6

293
262

274.7
319.7
331.5

258
295
295

aFormodels 1, 2, 3, and 4 the Pearsonstatisticis X(Yi for model 5 it


"i)2/ii; 1 +
andfor model6 it is
is
+
T7i)},
7ilTi)}.
ji)2/{I11
X(Y; ,i)2/{
R(Yi

data. For the Poisson models Yi, the numberof claims in the ith group (i = 1, .. .,315)
was assumed to be Poisson with mean Pi = Ti exp(x'3), where Ti was the exposure
for the ith group and the covariatesxi were chosen to representlevels of the factors in
Table 2, and possibly interactions.A Poisson model with covariatesonly for factor main
effects did not fit well, and there was evidence of extra-Poissonvariationor lack of fit (see
Table 3). We also examined factor interactions,but were unable to find a parsimonious
Poisson model that gave a gqod fit. Figure 1 shows a normal probabilityplot of the
Pearson residuals (yi - fi)/^7! for the main- effects Poisson model. The shape of the
plot, with residuals tending to be uniformlylarger in magnitudethan expected, suggests
overdispersion. Examination of various other residual plots did not reveal systematic
evidence for lack of fit due to an incorrectspecificationof the Poisson mean. Three cases
(groups 174, 180, and 183 in the data set) had particularlylarge residuals of 6.22, 6.24,
and 5.61 respectively. After deleting these observationsand refittingthe Poisson model,
we still find evidence for overdispersion(see Table 3). We observe also that the category
MAKE=9 is actually not one particularcar make, but ratherincludes any make not in the
1 tO MAKE=8.If we drop the 35 groups with MAKE=9and in addition
categories MAKE=
observations 8, 174, 183, 184, then (see Table 3) the resulting Poisson model fits well
except that extreme residuals tend to be smaller than expected.
It is difficult to assign sources for the lack of fit of the Poisson regressionmodels, and
a sensible alternativeapproachis to fit mixed Poisson models (2.1), which can be thought
of as incorporatingrandomeffects vi representingadditionalvariability.Either a model
seems plausible, the formerbecause
(2.1) with Var (vi) = t or one with Var (vi) =
-/Ti
there may be heterogeneityassociated with automobiles
with certain characteristics,and
the latter because there may be heterogeneity arising from the individual automobiles
which make up a group. Both Poisson-inverse-Gaussianmodels of this kind fit quite
well: see Table 3 and Figures 2 and 3, which portraynormalprobabilityplots of Pearson
residuals (yi - i)/{Pi(1 +
+gi))} 1 and (yi - Pi)/(~i(1 + Ti/T)^ } respectively for the two
models. We remark that observations 174 and 183 still have large residuals (5.30 and
-3.89) underthe first model, and 174, 180, and 183 still have large residuals (5.55, 5.11,
and -4.92) underthe second model. Although they give reasonablefits, we note that the
Pearson-statisticvalues (see Table 3) and Figures 2 and 3 suggest a slightly better fit for
the first model with '/ar (vi) = t than for the second.
Table 4 shows maximum-likelihoodestimates of main effects and their standarderrors
under the P-IG(4i, t) model, computed as outlined in Section 3. Estimates and standard
This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

1.

O-

-1

-1

-2
-2

-2

-1

1: Normal probability plot of the Pearson


residuals for the main-effects Poisson
model, Model 1. There are eight points
placed on the boundaries of the x-axis
whose values are actually outside of its
limits.

'_

-2

-1

Ordered Residuals

Ordered Residuals
FIGURE

7,/
./ .

FIGURE

2: Normal probability plot of the Pearson


residuals for the main-effects Poissoninverse-Gaussian model, Model 5. There
are two points placed on the boundaries
of the x-axis whose values are actually
outside of its limits.

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

DEAN,LAWLESS,AND WILLMOT

180

Vol. 17, No. 2

4: Parameterestimates and standarderrorsfor the (a) quasilikelihood and (b) maximum-likelihoodfits to the insurancedata.

TABLE

(a)

(b)

Parameter

Estimate

Std. Error

Estimate

Std. Error

0.0113
-1.719

0.0029
0.050

0.0115
-1.716

0.0035
0.050

Intercept
DISTANCE:

(1)
(2)
(3)
(4)
(5)

0.171
0.230
0.282
0.528

0.037
0.040
0.047
0.048

0.170
0.226
0.280
0.528

0.037
0.040
0.048
0.049

BONUS:

(1)

(2)
(3)
(4)
(5)
(6)
(7)

-0.551
-0.698
-0.910
-0.999
-1.046
-1.481

0.051
0.052
0.054
0.053
0.048
0.042

-0.551
-0.698
-0.910
-1.000
-1.049
-1.483

0.051
0.052
0.055
0.053
0.048
0.043

MAKE:

(1)

(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)

0.159
-0.142
-0.512
0.110
-0.415
-0.154
0.102
-0.029

0.057
0.062
0.065
0.060
0.057
0.070
0.093
0.038

0.159
-0.149
-0.513
0.110
-0.416
-0.155
0.100
-0.029

0.057
0.063
0.065
0.060
0.057
0.070
0.093
0.039

errors obtained from the quasilikelihood method-of-momentsprocedure embodied in


Equations (4.1) and (4.3) are also shown. These are remarkablyclose to the maximumlikelihood estimates. We have found that generally iterations for the quasilikelihood
moment estimates converge rapidly, and that it is useful to use them as startingvalues
for the maximum-likelihooditerativeprocedure.

ACKNOWLEDGEMENT
The authorsthankJ.D. Kalbfleisch,two anonymousreferees,and the editorfor constructive
comments.

REFERENCES
Andrews, D.F., and Herzberg, A.M. (1985). Data. Springer-Verlag,New York.
Anscombe, F. (1950). Samplingtheoryof the negative binomial and logarithmicseries distributions.Biometrika,
37, 358-382.
Breslow, N. (1984). Extra-Poissonvariationin log-linear models. Appl. Statist., 33, 38-44.
Brillinger, D.R. (1986). The natural variability of vital rates and associated statistics (with Discussion).
Biometrics, 42, 693-711.
Chernoff, H. (1954). On the distributionof the likelihood ratio. Ann. Math. Statist., 25, 573-578.

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

1989

POISSON-IGREGRESSION

181

Crowder,M. (1987). On linear and quadraticestimating functions. Biometrika,74, 591-597.


Dean, C. (1988). Mixed Poisson models and regression methods for count data. Ph.D. Thesis, University of
Waterloo.
Dean, C., and Lawless, J.F. (1989a). Testing for overdispersionin Poisson regression models. J. Amer. Statist.
Assoc., 84, 467-472.
Dean, C., and Lawless, J.F. (1989b). Commentson "An extension of quasilikelihoodestimation",by Godambe
and Thompson.J. Statist. Plann. Inference, 22, 155-158.
Engel, J. (1984). Models for response data showing extra-Poissonvariation.Statist. Neerlandica, 38, 159-167.
Folks, J.L., and Chhikara,R.S. (1978). The inverse Gaussian distribution and its statistical application-a
review. J. Roy. Statist. Soc. Ser. B, 40, 263-289.
Godambe, V.P., and Thompson, M.E. (1989). An extension of quasi-likelihood estimation. J. Statist. Plann.
Inference, 22, 137-152.
Hinde, J. (1982). Compound Poisson regression models. GLIM82: Proc. Internat. Conf. Generalized Linear
Models (R. Gilchrist, ed.), Springer-Verlag,Berlin, 109-121.
Holla, M.S. (1966). On a Poisson-inverse Gaussian distribution.Metrika, 11, 115-121.
Inagaki, N. (1973). Asymptotic relations between the likelihood estimating function and the maximum likelihood estimator.Ann. Inst. Statist. Math., 25, 1-26.
Jorgensen,B. (1987). Exponentialdispersionmodels (with Discussion). J. Roy. Statist.Soc. Ser. B, 49, 127-162.
Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canad. J. Statist., 15, 209-226.
McCullagh, P., and Nelder, J.A. (1983). GeneralizedLinear Models. Chapmanand Hall, London.
Ord, J.K., and Whitmore, G.A. (1986). The Poisson-inverse Gaussian distributionas a model for species
abundance.Comm. Statist. A, 15, 853-871.
Sankaran,M. (1968). Mixtures by the inverse Gaussian distribution.SankhydSer. B, 30, 455-458.
Sichel, H.S. (1971). On a family of discrete distributionsparticularlysuited to representlong tailed frequency
data. Proc. 3rd Symp. Math. Statist. (N.F. Loubscher,ed.), CSIR, Pretoria.
Sprott, D.A. (1965). Some comments on the question of identifiabilityof parametersraised by Rao. Classical
and Contagious Discrete Distributions(G.P. Patil, ed.), StatisticalPublishing Society, Calcutta,333-336.
Stein, G.Z., and Juritz,J.M. (1988). Linear models with an inverse-Gaussianerrordistribution.Comm. Statist.
Theory Methods, 17, 557-571.
Stein, G., Zucchini,W., and Juritz,J. (1987). Parameterestimationfor the Sichel distributionand its multivariate
extension. J. Amer. Statist. Assoc., 82, 938-944.
Tweedie, M.C.K. (1957). Statistical properties of inverse Gaussian distributions,I. Ann. Math. Statist., 28,
362-372.
Willmot, G.E. (1987). The Poisson-inverse Gaussian distributionas an alternativeto the negative binomial.
Scand. Actuar. J., 113-127.
Willmot, G.E. (1988). Parameterorthogonalityfor a family of discrete distributions.J. Amer. Statist. Assoc.,
83, 517-521.

Received 21 January 1988


Revised 13 January 1989
Accepted 6 March 1989

Departmentof Statistics and Actuarial Science


Universityof Waterloo,
Waterloo,Ontario N2L 3G1

This content downloaded from 134.84.148.170 on Mon, 17 Aug 2015 16:02:01 UTC
All use subject to JSTOR Terms and Conditions

You might also like