You are on page 1of 9

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/24137076

Negative binomial regression, second edition

Article February 1994


DOI: 10.1017/CBO9780511973420 Source: RePEc

CITATIONS READS

1,267 763

1 author:

Joseph Hilbe
Arizona State U and U of Hawaii
249 PUBLICATIONS 5,264 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Assessment in a View project

The Cosmostatistics Initiative (COIN) View project

All content following this page was uploaded by Joseph Hilbe on 23 December 2014.

The user has requested enhancement of the downloaded file.


ERRATA
UPDATE as of Nov 24, 2007

Negative Binomial Regression


Cambridge University Press

Joseph M. Hilbe
Arizona State University
Hilbe@asu.edu; jhilbe@aol.com

The book was first released for sale on July 29, 2007 at the Joint Statistical Meetings
(JSM) of the American Statistical Association, held at Salt Lake City, UT.

I have read through the text and identified errors that were overlooked during the editing
process. I apologize for these oversights. If readers find other errors, please contact me
and I shall post them to this sight. I hope that these errors can be corrected in the second
printing of the text.

Below the list of Errata, I providing a few thoughts regarding the discussion found in the
text, perhaps giving you a better insight into the reason I wrote it, to whom it is directed,
and thoughts about the statistics involved. I finished writing the main part of the text in
2006; the book was completed in early 2007. The subject has advanced during the
interval, and I wish to provide a brief update.

I have rewritten large parts of chapter 10 in light of recent advances in the area in
particular for GEE models. You can download the revised Chapter 10 at:
http://www.statistics.com/other/hilbe/index.php

Both this Errata page and the various data sets and user authored statistical commands for
examples used in the text are posted to the web site for the book at:
http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521857727
Find the place on the lower left side of the web site to access the 33 data sets identified in
Appendix E as well as Stata commands used to create examples in the text. Data files are
available in the following formats: Stata, SAS, SPSS, Excel, R, and Limdep. I am posting
Stata files in both version 10 format as well as in the older version 8-9 format. Users of
Stata 10 can read files saved in older versions, but users of versions under 10 cannot read
files saved using version 10. Details of these files, as well as of the other formats, are
explained on the web site.

Several of the examples used in the text were modeled using Stata commands not found
in commercial Stata or on the texts web site. These commands should mostly be found
at the following web site:
http://ideas.repec.org/s/boc/bocode.html
The majority of the relevant commands are from 2005, associated with my name. The site
allows for easy searching. Others not on the site are posted to this books web site.

I should give a quick note as to my intended audience. The book is not directed to
professional statisticians, although many will likely find new and hopefully interesting
information in the text. Rather, the book was primarily written for those researchers who
have little background in count response models, but who find that they have a need to
learn about them for an upcoming project or study. I have written the text in as clear a
manner as possible, many times re-emphasizing important items that need to be
remembered in the modeling process. My tone is more like a classroom presentation
rather than a formal text. I have attempted to speak directly to the reader, giving advice as
to the comparative modeling process --- outlining the algorithmic basis of the respective
count models, detailing different methods of estimation, selecting the appropriate model,
assessing fit, interpreting parameter estimates and ancillary parameters, and so forth.
Examples are given for each model discussed.

The end result is a book that can prove useful to researchers, to graduate students who
need to have a workable understanding of count models, as well as anyone else who is
simply interested in this area of statistical modeling. The focus of discussion is negative
binomial regression, which the reader will find designates a broad range of models.

I have primarily used Stata throughout the text for examples, with Limdep being used for
examples where no Stata command yet exists. Stata is one of the most popular statistical
applications worldwide, and has commands that accommodate nearly every negative
binomial model discussed in this text. I am including user-authored commands that are
easily attainable as well. It is second only to Limdep in the range of count response
model offerings. SAS, SPSS, S-Plus, Statistica, Genstat, and other popular commercial
packages have only minimal count model capabilities. R has many user count models, but
not nearly as many as found in Limdep or Stata. Moreover, I have previously authored a
number of published statistical procedures using the Stata language the majority of
which relate to count response modeling. Stata was therefore the reasonable choice for
displaying example output.

ERRATA
PREFACE, Page x: 2nd line of 3rd complete paragraph.
Web address:
www.cambridge.org/XXXXX should read www.cambridge.org/9780521857727.
This latter correct address is found in Appendix E (p. 240).

INTRODUCTION, Page 2: last sentence of first paragraph:


The SPSS program name is mistaken. It is now GENLIN, not GLZ.

Ch 2, Page 20, 1st paragraph, line 8:


Add family, so that the line should read:
link, variance, and family functions. The algorithm took care
CH 2, Page 21: line 2:
First word of line reads or. It should read "and".

Ch 2, Page 22: Equation 2.6 : Should read: X1 = X0 f(X0)/f (X0)


+ : Equation 2.8 : 2nd equation has a missing = sign. Should read
2L = 2L/

CH 2, Page 27, Table 2.1, AIC statistic note:


The note to the program line defining the AIC statistic reads:
/* AIC is sometimes defined w/o */. The last term, , should read, n, not .
See comments below.

Ch 3, page 40, 1st line under equation 2.53. caste should be spelled cast.

CH 3, page 47, equation 3.23:


There should be an equal sign between Zi and the other terms

CH 4, Page 52, second to last line of program code on page. The line should read:
. gen xb = 1 + .5*x1 - .75*x2 + .25*x3

CH 4, Page 63: last line of code in Table 4.1:


It should read: "w = 1/sqrt(sc)" instead of "w = /sqrt"

CH 4, Page 73, 6th line of Stata commands: should read:


. di 20.02131 + .3*20.02312^2 /* mean + alpha*mean^2 */

CH 5, Page 78. Table 5.1: The last two items should be numbered 24 and 25, and not 22
and 22.

CH 5, Page 82:Figures 5.19 and 5.20: Choose functions should not have division sign
between top and bottom terms.

CH 5, Page 94: value of variance under the word hence:


In the list of formulae there are two equations with a left hand side of V(). The 2nd of
these two equations should read V() instead of V(). Therefore V() = 1 + 2

CH 5, Page 95: table 5.5, 1st line in loop: Should read:


w = (/)+(y-){/(1+2+2+2)}

CH 5, Page 96: Section 5.5, line 6:


Should be two-parameter, not one-parameter. The line should read:
Poisson variance and 2/ the two-parameter gamma distribution variance. We

CH 5, page 96: Section 5.5, Paragraph 3, line 2. Delete comma between version and
of.
CH 6, Page 112: Formula for the negative binomial variance is mistaken
Should read: + 2
The formula as it currently reads is missing in the 2nd term.

CH 6, Page 119: Example 3, first line: log rather than data


Should read:
These data come from the 1912 Titanic survival log. It consists

CH 6, Page 128: line under first table display on top of page:


There is now a ? at the end of the line. It should be deleted.

CH 7, Page 136: Paragraph 2, line 4, first word:


Word variance should read assumptions. Line 4 of 2nd paragraph should start out as
assumptions of the Poisson distribution. Other models are

CH 7, Page 139, line 3 under table of parameter estimates, 1st word of line:
Word restricted should be changed to expected. The line should read, in part:
expected for a geometric model. Recall that unless

CH 7, Page 143, 2nd last line of last full paragraph.


Replace term Figure 7.1 with Figure 7.2.

CH 7, Page 153, last two lines on page. Should read:


-2{(-60322.021) (-60258.97)}
126.102

CH 7, Page 154, second line top paragraph: Should read:


Chi2tail(1,126.102) = 2.921e-29.

CH 7, Page 155, 7th-6th line from bottom: Yang, Hardin, and Addy (2006), not 2007.

CH 8, Page 171, bottom-page model output. Negative binomial age3 coefficient should
read: .023721. The decimal point was inadvertently dropped.

CH 8, Page 172, AIC Formula should read


AIChurdle = ((AICZT * (1-N>0/N) + AICbinary

CH 8, Page 174, Final full paragraph, final 4 lines:


The three terms predicting zero counts are parameterized in terms of rather than x, as
shown in the two formulae mid-page. To be consistent, the discussion should be in terms
of x in both places. The last 4 lines should read:
zero count are; (1) logistic inverse link, i.e. 1/(1+exp(-x)), the prediction that y==1, (2)
1- [1/(1+exp(-x)), and (3) the negative
CH 9, Page 180, Section 9.1.1, 2nd paragraph, line 3:
lower should read higher, comma to semicolon: Therefore, the line should read:
wish to extend C to any higher value in the observed distribution; the value to

CH 9, Page 181, Header for bottom table: Should read: POISSON: DROPPED
VALUES 0-3

CH 9, Page 193, line 2 under top table of parameter estimates:


selection instead of selected. The sentence should therefore read:
to the selection corrected Poisson.

CH 10, Pages 213-225, panel models using the progabide data.


It is preferable to use i(id) and t(t) options rather than what are used. Both make sense,
but using the above two options are more sensical and result in a better interpreted model.

CH 10, Page 226, Section 10.5, line 1:


also should read sometimes. Line should read:
Multilevel models are sometimes called hierarchical models, particularly

In fact, for clarification purposes, I would rather add an additional sentence after the first.
The first three sentences of the paragraph should then read:
Multilevel models are sometimes called hierarchical models, particularly in educational
and social science research. However, the majority of statisticians now tend to draw a
distinction between multilevel and hierarchical models, primarily because of the manner
in which the methods define order of levels, or nesting. Regardless, the idea behind
multilevel modeling is to model

NOTE: Chapter 10 has been rewritten and can be downloaded from:


http://www.statistics.com/other/hilbe/index.php
Many of the comments made above and below related to chapter 10 are accommodated in
the new Chapter. See comments on the new chapter at the end of this document.

COMMENTS ON DISCUSSION
Page 27: AIC and BIC statistics.
First note that text between /* and */ is comment, and not processed by the algorithm.
The problem here is the last term, . It should read, n, not . n represents the number of
observations in the model; is the link function, which is also the linear predictor.

The book also implies in various places that the BIC statistic can only be defined using
the deviance function, not the log-likelihood. I dont believe that I actually stated this, but
I think it is implied. Of course, this is not the case; the BIC can be defined in terms of the
log-likelihood: BIC = -2{LL k*ln(k))/n, where k is the number of model predictors
and n the number of model observations. LL is the log-likelihood function.
The model having the lowest value for its BIC statistic is the preferred model, fitting
better than the others. The degree of model preference is based on the absolute difference
between the BIC statistics of two models. A table of preferences can be shown as:
|difference| Degree of preference
-----------------------------------------------
0 - 2 Weak
2 - 8 Positive
6 - 10 Strong
> 10 Very Strong

Models A and B:
If BICA BICB < 0, then A preferred
If BICA BICB > 0, then B preferred

It is important to recognize that the AIC statistic appears with and without the
denominator, n. Both forms are common, so care must be taken when comparing models
to make certain that the same definition has been used in all cases you are comparing. As
with the BIC, the model having the lowest AIC is preferred over others.

Page 28: Section 2.2, opening paragraph. This paragraph may seem confusing, and I
offer a re-write below this paragraph. The paragraph as it exists in the book implies that
we have not yet addressed Fisher scoring, or the IRLS method of estimation that it will
follow the forthcoming discussion of Newton-Raphson type methods. However, the
reader will know that we just finished talking about IRLS methods, detailing the
theoretical justification as well as the algorithm. I wrote this section a year and a half
ago, and do not recall the exact rationale of the wording. However, I believe that I
intended to first discuss N-R methods, then IRLS. I later changed it for pedagogical
purposes, but failed to make the change to this paragraph. On the other hand, we will
discuss GLMs employing the observed information matrix in section 2.2.2, directly
following the derivation of the generalized Newton-Raphson type algorithm. Therefore,
there is some (only some) truth to what is implied in this paragraph with respect to order
of discussion, but it does need a revision as expressed below. My apologies to the reader.

NEW PARAGRAPH
In this section we discuss the derivation of the Newton-Raphson type algorithm. Until
recently, the only method used to estimate the standard negative binomial model was by
maximum likelihood estimation using a Newton-Raphson based algorithm. All other
varieties of negative binomial are still estimated using a Newton-Raphson based routine.
We shall observe in this section though, that the Iteratively Re-weighted Least Squares
method we discussed in the last section, known as Fisher Scoring, is a subset of the
Newton-Raphson method. We conclude by showing how the parameterization of the
GLM mean, , can be converted to X.

Page 38, Question 7: The AIC and BIC statistics are defined on page 27, as part of Table
2.1. There are some types of models that do not produce a log-likelihood function, and
therefore not a deviance function. Quasi-likelihood models do not produce a viable log-
likelihood function that can be used in an AIC statistic. Some software uses a deviance
statistic for the basis of IRLS modeling, but does use or define the log-likelihood. In
these cases an AIC statistic is not produced. Likewise for a model that uses a log-
likelihood function, but has no defined deviance. If it employs the BIC statistic requiring
a deviance, then it does not display a BIC statistic. The user can usually calculate it for
themselves.

Pages 85-90 Graphs


The lack of color in the book makes it difficult to determine which line is associated with
a specific mean and/or alpha. For Figures 5.1 5.6, which show different means for the
same value of alpha (), the values of the mean from top to bottom at count 0, are:
[ 0.5, 1, 2, 5, 10]

For Figures 5.7 - 5.11, the values of alpha for a specific mean, from top to bottom at
count 0, are [3, 1.5, 1, .67, .33, 0]. The BOOK COVER is the same as Figure 5.10.

Page 126: line 3 of text (under display of table). Sentence beginning with Age is
missing the word the. It should read, Age and education are not contributory to the
model. This is not really an error, just better grammar.

Page 163: Wald statistic in bottom output (header):


The Wald chi2(2) is now reported as a Likelihood Ratio test (LR test) in Statas ztnb
command. The output was created in a program by the same name I wrote before Stata
offered the command. Published on the SSC site, Stata Corp used it as the basis of the
new command, with LR rather than Wald, which came automatically with all ML
estimations.

Page 174: Bottom of page: Pursuant to the amendment I made to the text, as shown above
under Errata, the following addition to the correction could be made for clarification:
(2) 1- [1/(1+exp(-x)), the prediction that y==0, and . Also recall that for the final
formula of the paragraph, exp(x) can be substituted for .

CH 10: GEE models using progabide data


The version of the Stata software used for developing GEE models using the progabide
data did not recognize that the stationary, nonstationary, and unstructured correlation
structures are infeasible for this data; i.e. the algorithm did not recognize that the
correlation structure was not positive definite. The correlation matrix for these specific
models are not reliable. The software has since been amended to display an error
message when the matrix is infeasible, and will not produce a table of estimates or a post-
estimation correlation matrix. Stata versions 9.2 and higher work as they should; I am not
certain at which lower version the xtgee command was amended.

I suggest that the reader ignore the inherent unreliability of these specific model results,
reading the text and its interpretation as if appropriate convergence was achieved as it
appears in the output. The pedagogical value of the discussion is nevertheless valid. The
correlation values produced, together with parameter estimates and standard errors,
appear to be reasonable, and can be used with value in demonstrating how the models are
to be estimated and evaluated.

The stationary 4 correlation structure is feasible for this data, unlike other stationary
correlation structures. It can be developed using the command:
. xtgee seizures time timeXprog, fam(poi) i(id) t(t) corr(stat 4) force
It is important to use the force option when time intervals are not all equal. If in fact the
time intervals are equal, the force option will have no effect.

Note on new Chapter 10, GEE models: Justine Shults at the University of Pennsylvania
School of Medicine (Biostatistics) and her colleagues have built on previous work done
by N.R. Chaganty to construct an iterative adjustment to the underlying GEE algorithm
which guarantees, for selected correlation structures, a consistent estimate of the
correlation parameter and a positive definite estimated correlation matrix. The method is
called Quasi-Least Squares (QLS) and has particular use when the corresponding GEE
model is misspecified. In such cases the model typically fails to converge. Currently the
only correlation structures developed for QLS are the exchangeable, first order
autoregressive (AR 1), first order stationary (which Shults calls tri-diagonal), and
Markov, which is not in any current commercial GEE application. I recommend reading
J. Shults, S.J. Ratcliffe, and M. Leonard, (2007). Improved generalized estimating
equation analysis via xtqls for quasi-least squares in Stata, The Stata Journal, Vol 7:2
pp147-166. In the same issue, James Cui has an article titled, QIC program and model
selection in GEE analysis pp. 209-220. Both should be of interest to those interested in
modeling longitudinal and otherwise correlated data using GEE methodology.

View publication stats

You might also like