Professional Documents
Culture Documents
INTERPRETATION OF
THE GENERALIZED ZIPF-
MANDELBROT LAW
PARAMETERS
a b
Patricia Sastre-Vzquez , Yolanda Villacampa ,
b b
Jos A. Reyes & Fernando Garca-Alonso
a
Department of Applied Mathematics , Universidad
Nacional del Centro de la Provincia de Buenos
Aires , Argentina
b
Department of Applied Mathematics , University of
Alicante , Alicante, Spain
Published online: 28 Apr 2009.
Taylor & Francis makes every effort to ensure the accuracy of all the
information (the Content) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness,
or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the
Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or
indirectly in connection with, in relation to or arising out of the use of the
Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at
http://www.tandfonline.com/page/terms-and-conditions
Downloaded by [University of Haifa Library] at 12:58 25 October 2013
Cybernetics and Systems: An International Journal, 40: 326336
Copyright # 2009 Taylor & Francis Group, LLC
ISSN: 0196-9722 print=1087-6553 online
DOI: 10.1080/01969720902847029
PATRICIA SASTRE-VAZQUEZ 1
,
YOLANDA VILLACAMPA , JOSE A. REYES2, and
2
FERNANDO GARCIA-ALONSO2
1
Department of Applied Mathematics, Universidad
Nacional del Centro de la Provincia de Buenos Aires,
Argentina
2
Department of Applied Mathematics, University of
Alicante, Alicante, Spain
INTRODUCTION
This article deals with a particular type of text, namely, text systems in
accordance with the theory defined and published in Villacampa, Castro,
Uso, and Sastre (1999a) and Villacampa and Us o-Domenech (1999b).
For this type of text, the authors carried out an interpretation of the
law obtained as a generalization of the laws of Zipf (1932, 1949) and
Mandelbrot (1953), which they have denominated as the generalised
range-frequency law and which has been published in Sastre-Vazquez,
Downloaded by [University of Haifa Library] at 12:58 25 October 2013
where
a2b1 b2b2
C 2
a bb1 b2 1 Bb1 1; b2 1
Cb1 1Cb2 1
Bb1 1; b1 2 ; with b > 1 y > 1: 3
Cb1 b2 2
Downloaded by [University of Haifa Library] at 12:58 25 October 2013
aq
b1 b
Cb xb2 PL;
we obtain
f x PLx qb ; x 2 a; b; with b > 1 y b2 > 1;
1
0; x 2
= a; b
following equation:
X
n
H pi ln pi ;
i1
ANALYSIS OF VARIANCE
With the data obtained concerning sighted and nonsighted (blind) boys
and girls of school age (7 to 13, 1st year to 7th year of basic general edu-
cation in Argentina), analysis of variance was carried out. To do so, the
following factors were taken into account for the model: group, gender,
and age, with all second-order interactions and considering the following
analysis variables: 1) text length, 2) vocabulary, (number of different
words in the text), 3) token (quotient between vocabulary and text
length), 4) the parameters of the model derived from the Pearson system
(polynomial roots: a and b; exponents: b1 and b2).
The model used for the analysis of variance (ANOVA) corresponded
to a completely random design, with three classification criteria: group,
gender, and age, taking into account all second-order interactions, whose
general analytical equation is as follows:
with yijkl being: observation of the variable considered for the ith level of
the group, the jth level of gender, the kth level of age, and the lth repeti-
tion; l: general average; ai: fixed effect of the ith level of the group, with
i 1,2; bj: fixed effect of the jth level of gender, with j 1, 2; ck: fixed
effect of the kth level of age, with k 1, 2, . . . , 7; (ab)ij: effect of the inter-
action of the ith level of the group with the jth level of gender; (ac)ik:
effect of the interaction of the ith level of the group with the kth level
of age; (bc)jk: effect of the interaction of the j-th level of gender with
the k-th level of age; eijkl: random error.
Duncans test was used for the multiple comparison tests. This uses
multiple ranges and a variable level of significance that depends on the
number of averages involved for each stage. Before carrying out the
analysis of variance, the suppositions of normality and homogeneity
required by the models used were checked. To do so, residual analysis
330 P. SASTRE-VAZQUEZ ET AL.
was carried out, with no evidence that the hypothesis of normality was
not satisfied.
Verification of the homogeneity of variances was carried out using
graphical methods and using Levenes test. This test consists of carrying
out an analysis of variance of the absolute value of the residuals and
proves the null hypothesis that the variances of the populations are equal.
Said test resulted in the nonrejection of said hypothesis. Analysis of the
graphs of the residuals versus the sample averages did not show any wor-
Downloaded by [University of Haifa Library] at 12:58 25 October 2013
Dependent Different
variable Entropy Length Words Token Alfa Beta Beta 1 Beta 2
Table 2. Analysis of the variance for the models that were statistically significant
Age 7 10 8 9 11 12 13
Table 7. Correlation coefficients for Pearson=Prob > jRj under null hypothesis:
Rho 0=N 56
Alfa
Token Length Vocabulary Entropy
R 0.97645 0.71110 0.66311 0.20311
Prob > jRj 0.0001 0.0001 0.0001 0.1333
Beta
Vocabulary Length Token Entropy
R 0.99530 0.99131 0.69427 0.26444
Prob > jRj 0.0001 0.0001 0.0001 0.0489
Beta 1
Entropy Token Length Vocabulary
0.42405 0.33649 0.18294 0.13509
0.0011 0.0112 0.1772 0.3208
Beta 2
Entropy Token Length Vocabulary
R 0.21022 0.07037 0.05271 0.04966
Prob > jRj 0.1199 0.6063 0.6996 0.7163
334 P. SASTRE-VAZQUEZ ET AL.
With the estimates that arise from Tables 810 the following
regression equations can be formulated, which allow us to estimate the
parameters of the Pearson distributions:
Table 9. Regression of the beta parameter as a function of the proportion of the length of
text (length)
Dependent
variable FV d.f. Sum of Sq Mean Sq F Prob > F
Table 10. Regression of the alpha parameter as a function of the proportion of different
words in a text (TOK), of entropy (Entropy)
Dependent
variable FV d.f. Sum of Sq Mean Sq F Prob > F
CONCLUSIONS
The Alpha parameter of the generalized law is an indicator of the
proportion of different words that appear in the text. This parameter falls
by 0.25 units per single unit of increase in the proportion of new words in
the text.
The Beta parameter of the generalized law indicates the length of the
text in question. Its value increases by 0.48 units for each word added to
the text.
The Beta 1 parameter of the generalized law is a function of
the entropy of the text, of the proportion of different words, and the
interaction between these two variables. However, more studies
should be carried out to obtain a more precise explanation of this
parameter.
The Beta 2 parameter cannot be explained by the length, token,
vocabulary, and entropy variables. Based on the results obtained by
Sastre (2002), it is believed that said parameter could be related to the
language in which the texts are written.
The length of the narrative text increases with the age of the school-
children, up to the age of 11, as does the vocabulary used.
Schoolchildren aged 11 produce the longest narrative texts. From
this age onwards, the length of the narrations starts falling until the
age of 13, when they reach levels similar to those of children aged 10.
The length of texts for nonsighted schoolchildren aged 11 and 12 is
notably inferior to that of sighted schoolchildren of the same age, a
336 P. SASTRE-VAZQUEZ ET AL.
REFERENCES
Mandelbrot, B. 1953. Structure formelle des textes et communication. J. Word,
10: 127. Bingley, UK.
Sastre-Vazquez, P., Us o-Domenech, J. L., Villacampa-Esteve, Y., and Mateu, J.
Downloaded by [University of Haifa Library] at 12:58 25 October 2013