Professional Documents
Culture Documents
with
mixed-effects
regression
by
simulated
data
sets,
where
the
Introduction
sociolinguistic
examples,
the
article
shows
how
common
of the approach.
of
exposition
here),
and
word
nests
in
following
following //).
set do not all come from the same speaker, nor does
community
researcher
were
asking
questions
about
individual
share
linguistic
constraints
on
variable
far
as
word-level
grouping
is
concerned,
of
these
contextual
elements,
known
as
predictors. For example, one could explore how postvocalic /r/ is realized differently by men and women (a
so-called social or external factor, of which gender, age,
socioeconomic class, and ethnicity are typical examples)
or how a word-final /t, d/ is affected differently
depending
on
the
preceding
context
(a
so-called
deviations
include
to
ones
effects
of
unknown
causes),
or
relate
are
other
predictors
of
included),
the
observations
that
the
from
predictors
the
are
models
measured
are
neither
and
rate
predictors.
regression
is
usually
that
Linear
regression
(both
considered
Multiple
speech
statistical
performs
only
We
still
assume
that
the
observations,
when
perfectly
method
circumstances.
unstressed
syllables;
Wolfram
1969).
But
perhaps
logistic
differences.
allowing
regression
for
with
categorical
predictors,
continuous
dependent
or
not
independent
and 10 female).
accounted
10
for
all
the
relevant
between-speaker
nesting
results
social
would
factor
be
like
gender
misleadingly
actually
arbitrary
both
even
speaker effects.
this
relative
solution
above
contributions
between-speaker
one
of
where,
for
individual
variable
like
example,
speaker
gender
the
and
cannot
a
be
accurately determined.
Fixed-effects
regression
encounters
the
same
11
12
from
given
research
design,
determines
benign.
the
speech
community,
with
statements
like
the
little attention since (but see Van de Velde and van Hout
2006: 341).
one
had
to
choose
Another
reason
that
individual
differences
have
between
individual
and
the
group,
the
only
individual
estimation
of
the
magnitude
of
the
constraints
themselves.
13
14
that
by
includes
items
not
data,
they
make
dangerous
nesting problem.
and
speaker
their
theory,
individual
pooling
statistical
the
are
in
The
(1991)
these
considered
Guy
[i]f
especially
the
themselves.
from
the
final
models
(Guy
1988:
128,
16
be expected by chance.
1990s.
Pinheiro
and
Bates
(2000)
achieved
significantly
effects modeling) in R.
different
from
zero,
we
can
perform
18
following
usage
http://glmm.wikidot.com/faq),
statistical
always evolving.
recommendations
summarize
although
the
does.
(Baayen et al. 2008; but see Barr et al. 2013). For fixed-
19
20
and
differently.
plausibly
for
different
words
too,
while
usually
proceeded
quite
differently.
While
the
Only
individual-speaker
grouping
will
be
these
cases,
the
omissions
have
substantial
consequences.
For the sake of simplicity in exposition, the next
sections largely set aside the potential benefits of
21
22
same
difference
in
percentages
does
not
always
represents.)
24
speaker.
Fixed-effects regression distorts group differences by
ignoring data imbalance and treating each token equally
(averaging by token), thereby potentially counting some
speakers much more than others. We return to Figure 1
to illustrate this distortion.
The left panel of Figure 1 ignores the fact that
different speakers contributed different numbers of
tokens. We have an average deletion rate of 54.7 percent
(3805/6962) for the data from male speakers, compared
with 52.2 percent (3496/6702) for the data from female
speakers. The gender effect size is 0.100 log-odds, as
noted above.
But if we count speakers equally and simply average
their deletion percentages, the gender difference comes
25
26
out less than half as large: 53.1 percent for the males vs.
But this does not take into account that the woman
the data is pooled, these two women both cause the /r/
rate
effects model.
for
females
to
be
underestimated,
in
turn
can
approach
vary;
here,
the
effect
size
of
gender
was
throws
away
valuable
data
arbitrarily,
27
28
acoustic
The
discussion
so
far
has
revolved
around
the
suppose
we
were
interested
in
the
resonances
in
the
vocal
tract
that
are
29
30
to
their
reading
passage
tokens,
position
of
both
small
circles
(speakers
with
25
every
the
fixed-effects
and
mixed-effects
32
exactly at -50 Hz. And in run 765, the large and small
circles are all on the low side, cancelling each other out;
fixed-effects
Lucille
298,
and
Mae
394.
And
the
more
model
(lacking
speaker
effect)
to
33
34
completion
in
have
been
observed
for
changes
within-speaker
logistic
effect.
Doing
so
averages
35
36
the same size topic effect: Mae and Lindsey each have a
37
38
differences in intercepts.
of individual-word variation.
predictors
like
speaker
and
word,
and
thus
40
they
context, following
context, morphological
41
42
Differences in significance
word
appears
with
many
different
following
contexts.
In the mixed model, the 40 estimated speaker
intercepts
are
approximately
normally
distributed,
values from likelihood-ratio tests returned by fixedeffects and mixed-effects models, regarding the six
predictors described above.
Some of the p-values are very small, and so they are
all given in scientific notation: for example, 2.06 x 10-17
means .0000000000000000206. The exact size of pvalues is meaningful, especially in a methodological
comparison like this. Indeed, the idea that p-values must
be reported and interpreted as either significant or not
has been challenged, even by Fisher, the inventor of the
p-value, himself (Gigerenzer et al. 2004).
43
44
the
as
three
phonological
predictors,
as
well
random
by-word
slope
could
distinguish
higher in all cases but one, and usually vastly higher; the
exception
nesting
is
model.
By
following
contrast,
context.
the
Without
fixed-effects
model
twice the usual gender effect: 81% deletion for men, 70%
33% deletion for men vs. 40% for women. Mixed models
pattern
transcripts
differences.
like
this,
or
but
audio
to
they
look
are
for
indispensable
other
for
predictors
similar: 0.184.
45
46
infinitesimal fixed-effects
p-value
for
word
detected
in
previous
studies
will
remain
significant using a mixed-effects model. With a fixedeffects model the significance of many predictors will be
exaggerated. This may not matter if we are considering a
predictor that actually has a real effect. But fixed-effects
regression may also claim significance for predictors
that have no relationship to the response other than that
due to random chance (Type I error).
47
48
show
more
deletion
overall,
even
in
disfavoring
50
following-context
category.
effects.
If
words
are
individually
develop somehow).
In terms of the deletion-favoring effect of a following
may
support
functionalist
tendency
for
lexical
phonology
account
(Guy
1991)
where
before a vowel.
51
52
This
change
is
brought
about
by
the
random
few most frequent words might suggest the mixedeffects model is a useful, if not essential, tool.
behavior
necessarily
being
linked
to
between-word
largely
comprised
of
different
words.
Fixed-effects
having
described
several
clear
53
54
Buckeye data, like any real data set, are unknown, but
speaker
Indeed,
random
revealing
effects,
the
which
extent
of
help
distinguish
individual-speaker
order
to
better
reveal
individual
patterns
and
55
56
While
there
exist
many
other
valuable
modern
cascading models.
accuracy
of
mixed
models
with
respect
to
both
At
the
same
time
that
they
focus
sharper
is
more
variable
between
speakers
and
hence
average
probability.
Conversely,
the
rare
58
Thanks
References
59
Silva,
R.,
editors,
Proceedings
of
the
Fourth
60
Unpublished manuscript.
http://ling.auf.net/lingBuzz/001080.
526-557.
Sage. 391-408.
61
62
Linguistics.
Dordrecht: Foris.
199-244.
63
64
3283-3299.
University of Wellington.
http://www.R-project.org/.
65
67
66