Professional Documents
Culture Documents
human language
Ramon Ferrer i Cancho and Ricard V. Sole
Complex Systems Lab, Universitat Pompeu Fabra, Doctor Aiguader 80, 08003 Barcelona, Spain; and Santa Fe Institute, 1399 Hyde Park Road,
Santa Fe, NM 87501
Edited by Kenneth W. Wachter, University of California, Berkeley, CA, and approved December 6, 2002 (received for review September 29, 2002)
a ij pr j ,
[1]
ps i , r j .
[2]
[3]
1
,
j
[4]
www.pnas.orgcgidoi10.1073pnas.0335980100
[10]
pr j , s i a ij
pr j
.
j
[5]
The effort for the speaker will be defined in terms of the diversity
of signals, here measured by means of the signal entropy, i.e.
H n S
psilogn psi.
[6]
i1
H m R s i
[7]
j1
i i 0
n
[11]
[12]
where p(rj si) p(rj, si)/p(si) (by the Bayes theorem). The effort
for the hearer is defined as the average noise for the hearer, that
is
H m R S 1 H n S R,
H m R S
psiHmR , si.
[8]
i1
An energy function combining the effort for the hearer and the
effort for the speaker is defined as
H m R S 1 H n S,
[9]
[13]
[14]
Notice that is present in only one of the terms of the right side
of the previous equation. Zipfs hypothesis was based on a
tension between unification and diversification forces (11) that
Eq. 13 does not accomplish. Eq. 9 does.
PNAS February 4, 2003 vol. 100 no. 3 789
APPLIED
MATHEMATICS
Fig. 2. (A) In(S, R), the average mutual information as a function of . * 0.41 divides In(S, R) into no-communication and perfect-communication phases.
(B) Average (effective) lexicon size, L, as a function of . An abrupt change is seen for 0.41 in both of them. Averages over 30 replicas: n m 150, T
2nm, and 2/(2n).
Discussion
Theoretical models support the emergence of complex language
as the result of overcoming error limits (5) or thresholds in the
amount of objects of reference that can be handled (6). Despite
their power, these models make little use of some well known
quantitative regularities displayed by most human languages
such as Zipfs law (11, 18). Most authors, however, make use of
Zipfs law as a null hypothesis with no particular significance (6).
As far as we know, there is no compelling explanation for Zipfs
law, although many have been proposed (1923). Random texts
(random combinations of letters and blanks) reproduce Zipfs
law (19, 2426) and are generally regarded as a null hypothesis
(18). Although random texts and real texts differ in many aspects
(26, 27), the possibility that Zipfs law results from a simple
process (not necessarily a random text) has not been soundly
denied. Our results show that Zipfs law is the outcome of the
nontrivial arrangement of wordconcept associations adopted
for complying with hearer and speaker needs. Sudden changes in
Fig. 2 and the presence of scaling (Fig. 3B) strongly suggest that
a phase transition is taking place at * (17).
Fig. 3. Signal normalized frequency, P(k), versus rank, k, for 0.3 (A), * 0.41 (B), and 0.5 (B and C) (averages over 30 replicas: n m 150 and
T 2nm). The dotted lines show the distribution that would be obtained if signals and objects connected after a Poissonian distribution of degrees with the
same number of connections of the minimum energy configurations. The distribution in B is consistent with human language ( 1).
790 www.pnas.orgcgidoi10.1073pnas.0335980100
1. Chomsky, N. (1968) Language and Mind (Harcourt, Brace, and World, New
York).
2. Deacon, T. W. (1997) The Symbolic Species: The Co-evolution of Language and
the Brain (Norton & Company, New York).
3. Pinker, S. & Bloom, P. (1990) Behav. Brain Sci. 13, 707784.
4. Bickerton, D. (1990) Language and Species (Chicago Univ. Press, Chicago).
5. Nowak, M. A. & Krakauer, D. C. (1999) Proc. Natl. Acad. Sci. USA 96,
80288033.
6. Nowak, M. A., Plotkin, J. B. & Jansen, V. A. (2000) Nature 404, 495498.
7. Hauser, M. D. (1996) The Evolution of Communication (MIT Press, Cambridge,
MA).
8. Miller, G. (1981) Language and Speech (Freeman, San Francisco).
9. Ujhelyi, M. (1996) J. Theor. Biol. 180, 7176.
10. Ko
hler, R. (1987) Theor. Linguist. 14, 241257.
11. Zipf, G. K. (1949) Human Behaviour and the Principle of Least Effort: An
Introduction to Human Ecology (AddisonWesley, Cambridge, MA); reprinted
in Zipf, G. K. (1972) Human Behaviour and the Principle of Least Effort: An
Introduction to Human Ecology (Hafner, New York), 1st ed., pp. 1955.
12. Gernsbacher, M. A., ed. (1994) Handbook of Psycholinguistics (Academic, San
Diego).
13. Ko
hler, R. (1986) Zur Linguistischen Synergetik: Struktur und Dynamik der Lexik
(Brockmeyer, Bochum, Germany).
14. Balasubrahmanyan, V. K. & Naranan, S. (1996) J. Quant. Linguist. 3, 177228.
15. Ash, R. B. (1965) Information Theory (Wiley, New York).
16. Sole, R. V., Manrubia, S. C., Luque, B., Delgado, J. & Bascompte, J. (1996)
Complexity 1, 1326.
17. Binney, J., Dowrick, N., Fisher, A. & Newman, M. (1992) The Theory of Critical
Phenomena: An Introduction to the Renormalization Group (Oxford Univ. Press,
New York).
18. Miller, G. A. & Chomsky, N. (1963) in Handbook of Mathematical Psychology,
eds. Luce, R. D., Bush, R. & Galanter, E. (Wiley, New York), Vol. 2.
19. Mandelbrot, B. (1966) in Readings in Mathematical Social Sciences, eds.
Lazarsfield, P. F & Henry, N. W. (MIT Press, Cambridge, MA), pp. 151168.
20. Simon, H. A. (1955) Biometrika 42, 425440.
21. Pietronero, L., Tosatti, E., Tosatti, V. & Vespignani, A. (2001) Physica A 293,
297304.
22. Nicolis, J. S. (1991) Chaos and Information Processing (World Scientific,
Singapore).
23. Naranan, S. & Balasubrahmanyan, V. (1998) J. Quant. Linguist. 5, 3561.
24. Miller, G. A. (1957) Am. J. Psychol. 70, 311314.
25. Li, W. (1992) IEEE Trans. Inf. Theor. 38, 18421845.
26. Cohen, A., Mantegna, R. N. & Havlin, S. (1997) Fractals 5, 95104.
27. Ferrer i Cancho, R. & Sole, R. V. (2002) Adv. Complex Syst. 5, 16.
28. Steels, L. (1996) in Proceedings of the 5th Artificial Life Conference, ed. Langton,
C. (AddisonWesley, Redwood, CA).
29. Nowak, M. A., Plotkin, J. B. & Krakauer, D. C. (1999) J. Theor. Biol. 200, 147162.
30. Ellis, S. R. & Hitchcock, R. J. (1986) IEEE Trans. Syst. Man Cybern. 16, 423427.
31. Reader, S. M. & Laland, K. N. (2002) Proc. Natl. Acad. Sci. USA 99, 44364441.
We thank P. Fernandez, R. Ko
hler, P. Niyogi, and M. Nowak for helpful
comments. This work was supported by the Institucio
Catalana de
Recerca i Estudis Avancats, the Grup de Recerca en Informa`tica
Biome`dica, the Santa Fe Institute (to R.V.S.), Generalitat de Catalunya
Grant FI/2000-00393 (to R.F.i.C.), and Ministerio de Ciencia y Technologia Grant BFM 2001-2154 (to R.V.S.).
APPLIED
MATHEMATICS