You are on page 1of 12

doi: 10.1111/ahg.

12015

Surnames in Albania: A Study of the Population of Albania


through Isonymy
Ilia Mikerezi2 , Endrit Xhina2 , Chiara Scapoli1 , Guido Barbujani1 , Elisabetta Mamolini1 ,
Massimo Sandri1 , Alberto Carrieri1 , Alvaro Rodriguez-Larralde3 and Italo Barrai1
1
Department of Life Sciences and Biotechnology, University of Ferrara, 44121, Ferrara, Italy
2
Department of Biology, Faculty of Natural Sciences, Tirana, Albania
3
Centro de Medicina Experimental, Laboratorio de Genetica Humana, IVIC, Apdo. 20632, Caracas 1020A, Venezuela

Summary
In order to describe the isonymic structure of Albania, the distribution of 3,068,447 surnames was studied in the 12
prefectures and their administrative subdivisions: the 36 districts and 321 communes. The number of different surnames
found was 37,184. Effective surname number for the entire country was 1327, the average for prefectures was 653.3
84.3, for districts 365.9 42.0 and for communes 122.6 8.7. These values display a variation of inbreeding between
administrative levels in the Albanian population, which can be attributed to the previously published Prefecture effect.
Matrices of isonymic distances between units within administrative levels were tested for correlation with geographic
distances. The correlations were highest for prefectures (r = 0.71 0.06 for Euclidean distance) and lowest for communes
(r = 0.37 0.011 for Neis distance).
The multivariate analyses (Principal component analysis and Multidimensional Scaling) of prefectures identify three main
clusters, one toward the North, the second in Central Albania, and the third in the South. This pattern is consistent
with important subclusters from districts and communes, which point out that the country may have been colonised by
diffusion of groups in the North-South direction, and from Macedonia in the East, over a pre-existing Illiryan population.

Keywords: Albania, population structure, isonymy, inbreeding, isolation by distance

Introduction 1920, and excluding the World War II parenthesis, it has been
independent ever since.
Albania has a long and complex history. It was populated by The language spoken in Albania is a separate Indo-
an Aryan people, the Illiryans, around 3000 BC. In historical European branch spoken by more than 7 million persons,
times, it was conquered by the Macedons of Phylip in 300 and has influences from Latin, Greek, and in modern times
350 BC, coming under Greek power. Then, it became a from Southern Slavic. The land is mountainous, and the Alba-
Roman province first under the Republic and then under nians call themselves Shqipetari, children of the eagles. The
the Empire for about five centuries. After the split of the present language is derived from the Toske dialect, which is
Empire, it stayed under the rule of the Byzantines until the spoken in the South of the country, as opposed to the Gege di-
15th century, when it became part of the Ottoman Empire. alect in the North. Due to the relative isolation of the country
When the Ottoman Empire dissolved in 1912, nationalism and to minor settlements of invading armies over the course
arose in Albania, and the country gained independence in of centuries, its population seems of considerable interest for
the study of population genetics.
However, studies of the genetic structure of the Albanian

population are recent and few, and refer mainly to the fre-
Corresponding author: Chiara Scapoli, Department of Life Sci- quencies of traditional blood group markers (Mikerezi et al.,
ences and Biotechnology, University of Ferrara, Via L. Borsari 46,
I-44121 Ferrara, Italy. Tel: +39-0532-455744; Fax: +39-0532- 1995; Susanne et al., 1996) and to the distribution of sur-
249761; E-mail: scc@unife.it names (Mikerezi et al., 2003). In this work, we continue to


C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 1
I. Mikerezi et al.

investigate the Albanian population with the aim of detect-


ing its structure through the isonymic methods as defined by
Crow and Mange (Crow & Mange, 1965) in the three ad-
ministrative levels of the nation, namely: 12 prefectures, 36
districts and 321 communes. The data that were made avail-
able to us are the surnames of the electors of the 2009 general
elections database.
We report here how, in Albania, isonymic distance varies
with geography, as we observed in other European countries.
We obtained indications of the direction of migration, by
studying the geographic heterogeneity of surnames. For each
level, we studied the surname effective number, , and the
value of random inbreeding, FST .
We recall that surnames are a weak marker for inbreeding
and a strong marker for migration. Two Bianchi in Italy
may be more or less distantly related, as two White in
Britain, but one Bianchi in Britain or one White in Italy
are indicative of migration, as clearly as an immunofluorescent
cell in a negative field. With this proviso, our aim in this work
was the study of the present isonymic structure of Albania
resulting from surname drift and population movements in an
area about 320 km long and on average about 90 km wide,
bordering with the Adriatic sea, South of Montenegro and
Kosovo, West of Macedonia and North of Greece.

Materials and Methods


Administrative Subdivisions of Albania
In 2011, one of the authors (IM) obtained from the Cen-
tral Election Commission (CEC) of Albania the data suitable
for describing the isonymy structure of the country with the
methodologies developed by us. In the data that were made
available, a total of 3,068,447 individuals were distributed in
the 12 prefectures, the 36 districts, and in 373 communes. The Figure 1 Distribution of the 321 communes (dots) in the 12
Albanian Administration classifies as communes 308 such prefectures and 36 districts as acquired from 2009 census data in
units which are prevalently agricultural, plus 65 bashkias Albania.
which are predominantly urban. However, several communes
are pooled for electoral purposes, so that we had available
321 lower units, some of them groups of smaller units. In six prefectures in the North, the northernmost being Shkoder,
this analysis, we decided to use these hierarchical subdivisions Kukes, and Lezhe, then southward the two prefectures of
as statistical units, since the geography of all three levels is Diber and Durres, followed by the prefecture of the capital
well-defined, and all the individuals in the sample available Tirane. Traditionally, the River Shkumbin in the central zone
are classified accordingly, communes inside districts inside across the prefecture of Elbasan separates the North from the
prefectures inside Albania. Hence, for the analysis, we had South and the two dialects of Albania, the Gege from the
available 37,184 surnames of more than 3 million individuals, Toske. The South has six prefectures, namely Elbasan itself,
all classified according to the administrative subdivisions. Fier and Berat, and Korce , Vlore, and Gjirokaster. The last
The area studied covers the entire nation, about 28,000 three are the southernmost and border with Greece.
square km, an area slightly larger than Sicily. The 12 prefec- Differences in surnames due to the complexities of the 36-
tures differ in position, area, and population. The prefectures, letter Albanian alphabet were maintained through the proper
districts, and communes are indicated in Figure 1. There are ASCII codes.

2 Annals of Human Genetics (2013) 


C 2013 Blackwell Publishing Ltd/University College London
Surnames in Albania

In the following subsections, we briefly touch on and re- where the summation is over all surnames. Neis distance (Nei,
call the definitions of some of the statistics derived from the 1973) is
surname distributions and their meaning in the study of mi-  
croevolution in human groups (for an exhaustive review, see Iij
Nd = log  .
Relethford, 1988). (Iii Ijj )
Euclidean and Neis distances have been developed for
Isonymy within and between groups purely genetic data; however, they can be applied to the fre-
The main statistics derived from surname distributions are: quencies of surnames, since these simulate alleles at a locus in

(1) isonymy within a group J, namely Ijj = k pkj 2 where the recombining region of the Y chromosome (the daughters
pkj is the relative frequency of surname k in group J, and the inherit the surname with the paternal X chromosome).
sums comprise all surnames; and (2) As geographical coordinates, we used the centroids of
random isonymy between
groups I and J estimated as Iij = k pki pkj ; where pki and pkj prefecture, district and commune areas obtained from the
are the relative frequencies of surname k in groups I and J, ArcGis (ESRI) map downloaded from Global Administra-
R

respectively, and the sums comprise all surnames. tive Areas site (http://gadm.org/).
The distribution of surnames between groups, in this case The correlations of isonymic distances with the geographic
prefectures, districts, and communes, is useful for assessing ones give very similar results independently from the isonymic
their population similarities, under the limit hypothesis of index used, and this is further indication that either of the
common origin. isonymy measures can be used without loss of generality.
The significance of correlations was assessed with the Man-
tels test using 1000 permutations (Mantel, 1967; Smouse
Fishers alpha () et al., 1986). For a graphic representation of the surname re-
Fishers was estimated according to Barrai et al. (1996). It lationship between different prefectures, these were mapped
estimates the number of surnames having equal frequency, on the first and second dimension of the Multidimensional
which would result in the same isonymy as that observed. Scaling (MDS) of Laskers distance matrix. In order to de-
It is exactly homologous to the allele effective number in tect the direction of surname diffusion, following Menozzi
a genetic system (Barrai et al., 2000). A small value of et al. (1978), the first three components from the Principal
would indicate large inbreeding and drift, whereas a large Component Analysis (PCA) of the same matrix, were also
value would indicate migration and low inbreeding. It has projected individually on the Albania map, with the ArcGis R

been verified (Wright, 1951) that in the presence of a rate of (ESRI) software package. To complement and clarify the clus-
migration (m): FST = 1/(4Nm + 1), then, = Nm + (1/4), tering, we built dendrograms (Ward, 1963; Cavalli-Sforza &
since FST = I/4 (Crow & Mange, 1965) and = 1/I for large Edwards, 1967) of prefectures and of districts. These were
samples (Rodriguez-Larralde et al., 1993). Then, for large obtained from the matrix of Lasker distances between admin-
N, tends to Nm. This makes a useful predictor of the istrative sections, using the agglomeration method of Ward
evolutionary dynamics of a system, and a sufficient indicator (1963). They were considered only as a help to the cluster-
of structure. ing, we do not imply that the present situation was generated
by subsequent splits of preexisting clusters.
Isolation by distance
To detect isolation by distance, we calculate the linear Random kinship
correlation of surname distances (Laskers, Euclidean and Random kinship IJ (x) between any two localities I and J at
Neis) between localities I and J, with their geographic distance x is given by
distances. IJ (x) = K exp (Bx) (Malecot, 1955; Kimura, 1960)
Laskers distance (Rodriguez-Larralde et al., 1998) is
defined as where K is the average kinship at geographic distance x =
0, say average FST , and B is a function of average mutation
L = log(Iij ). rate and of the variance of x. Then, IJ (x) is always positive
and is expected to decrease exponentially to 0 with increasing
Euclidean distance (Cavalli-Sforza & Edwards, 1967) is de- distance. Random kinship was defined as
fined as
IJ (x) = IIJ (x)/4
 

E = 1 pki pkj (Barrai et al., 2012) with average FST as the average kinship
k at distance x = 0.


C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 3
I. Mikerezi et al.

Results and Discussion rences, equal to 19.0% of the total number of surnames used
here. The most frequent surnames are Hoxha with 39,088 oc-
The Most Frequent Surnames currences, C ela with 14,632, Marku with 13,852, Shehu with
The distribution, by prefecture and district, of the surname 12,348, and Muca with 12,236. After these, one finds Kola
numbers used in the analysis with the main parameters de- (11,443), Dervishi (10,953), Gjoka (10,191), Kurti (10,152)
rived from the isonymy theory, are given in Table 1. The and in 10th place Koci (9533). Overall, the first 10 surnames
data for communes and bashkias are presented in Table S1 comprise 144,428 individuals, or 4.7% of the total number of
available, as all further supplementary materials mentioned electors.
in this paper, at our website: http://web.unife.it/utenti/ Surnames of clear Arabic origin are frequent in the North
alberto.carrieri/ricerca.htm. and the East of Albania. Dervishi (10,953), which is seventh
In Figure S1, we give the distribution of the logarithm of in the general list, is the first name of clear Arabic origin,
the number of surnames over the logarithm of the number followed by Elezi (8155), Sinani (6237), Hasani (4541), and
of times they occur (Fox & Lasker, 1983; Zipf, 1935. See Osmani (4103). The Turkish language was the main vehi-
this last reference for the meaning and uses of the log-log cle for other frequent surnames that were formed by first
distribution). In this case, it is fairly linear (Fig. S1). It is names of Arabic or Persian origin like Brahimaj (1684),
called a typical rank-size distribution or Zipfian curve, and Brahimi (2225), Elezaj (1970), Islami (1751), among several
it is so named by glottologists (Adamic & Huberman, 2002), others.
and here it indicates the number of instances (people) with a Greek surnames, a result of the influence of the Christian
unique surname. orthodox religion, are frequent in the South of the Coun-
In Albania, surnames originated and have been established try, which borders with Greece. Short lists of the 30 most
generally in the same way as in other European countries. frequent Albanian surnames of Arabian and Greek origin are
The Albanian language belongs to the Indo-European group, given in Tables S3 and S4. However, these lists are by far
and, despite several exchanges with other languages, it has incomplete, since they are based on our knowledge of Arabic
preserved its own structure in its formative elements. Accord- and Greek, knowledge, which is very limited. In particular,
ing to Bidollari (2010), the language does not possess general for the Greek names, we list only those which start with
rules, as other Indo-European languages, for the patronimic Papa (which means priest, father) to avoid uncertain-
formation like the suffixes -ad`es, -eid`es, -poulos in Greek, -ez ties. There are 9961 surnames beginning with Papa, which
in Spanish and Portuguese, -escu in Romanian, -ich in Slavic are joined with another name of Christian (or sometimes non-
languages and so on. It does not possess suffix elements indi- Christian) origin, like Papajani, Papajorgji, and Papanikolla.
cating lineage like -son, preferred frequently in the English and Note the curious Papazisi, which might be a translocation of
Swedish languages, or -sohn in German, and -sen in Danish. the Arabic Aziz (which means strong) on the Greek Papa.
However, many Albanian surnames have been formed by the So Papazisi might be the father of the strong.
patronymisation process of the anthroponyms (first names),
ethnonyms and toponyms in all the cases when it was neces-
sary to indicate social or geographic origin. Isonymy Parameters in Albanian Prefectures,
Albania was for nearly five centuries under Turkish oc- Districts, and Communes
cupation. Therefore, several surnames, like Hoxha, Hoxhaj,
Shehu, Shehaj, Dervishi and others have been introduced Fishers alpha and inbreeding by isonymy
through the Muslim religion indicating in such cases lev- Values of and FST are given in Table 1 for prefectures and
els of the religious hierarchy. Some other surnames have been districts and in Table S1 for communes. We recall that , the
strongly influenced by the Turkish language, for example, sur- effective
 2 surname number, is the inverse of isonymy I (I =
names that have been formed by the introduction of suffixes p and = 1/I, Barrai et al., 1996), so that FST = 1/(4)
like -llari, -xhi, -lli and -li. and then the meaning of is exactly homologous to the
Here, we deal with 3,068,447 persons and 37,184 sur- effective allele number of genetic systems.
names, so that the average number of instances (persons) The effective surname number , in Albania, was estimated
having an unique surname, the so called type-token ra- at 1327 for the country, considered as a unit. The average for
tio of glottologists, is 82 (see further down our ratio Sample the 12 prefectures was 653.3 84.3. For the 36 districts, it
Size/Surnames in Table 2 and King and Jobling (2009) for was 365.9 42.0 and for the 321 communes it was 122.6
other type-token ratios in Europe). 8.7. The difference between the estimates of , then of
We studied in some detail the 100 most frequent surnames FST , in prefectures, districts, communes and for the country
(Table S2). Overall, these surnames comprise 583,708 occur- as a unit, is observed when different subdivisions of the same

4 Annals of Human Genetics (2013) 


C 2013 Blackwell Publishing Ltd/University College London
Surnames in Albania

Table 1 Prefecture, district, number of surnames N, number of different surnames S, Fishers , Karlin-McGregor , isonymy I, and FST in
Albania. Districts grouped by prefecture.

Prefecture District N S I FST

Berat 169,377 5276 496 0.00293 0.00201 0.000505


Berat 112,084 4042 420 0.00374 0.00238 0.000597
Kucove 35,894 2123 277 0.00767 0.0036 0.000907
Skrapar 21,399 1314 273 0.01258 0.00366 0.000926
Diber 120,994 2482 377 0.00312 0.00265 0.000664
Diber 50,866 1216 298 0.00582 0.00335 0.000844
Mat 42,669 1296 247 0.00575 0.00404 0.001017
Bulqize 27,459 915 191 0.00691 0.00521 0.001312
Durres 289,512 9698 775 0.00268 0.00129 0.000323
Durres 236,662 9149 757 0.0032 0.00132 0.000331
Kruje 52,850 1861 337 0.00633 0.00297 0.000746
Elbasan 299,600 6555 457 0.00153 0.00219 0.000548
Elbasan 197,185 5568 442 0.00224 0.00226 0.000566
Gramsh 25,062 839 168 0.00663 0.00595 0.001497
Peqin 26,185 1103 103 0.00396 0.00953 0.002392
Librazhd 51,168 1309 186 0.00362 0.00536 0.001346
Fier 352,352 7479 623 0.00177 0.00161 0.000402
Fier 193,704 5379 510 0.00264 0.00196 0.000491
Lushnje 128,406 3691 345 0.00268 0.0029 0.000726
Mallakaster 30,242 998 147 0.00482 0.0068 0.001709
Gjirokaster 121,628 4544 910 0.00744 0.0011 0.000277
Gjirokaster 66,969 3150 767 0.01133 0.0013 0.00033
Tepelene 28,946 1621 273 0.00934 0.00365 0.000922
Permet 25,713 1539 460 0.01756 0.00217 0.000553
Korce 264,449 7860 1110 0.00419 0.0009 0.000226
Korce 152,114 6250 1108 0.00724 0.0009 0.000227
Kolonje 15,813 1232 453 0.02783 0.00221 0.000567
Pogradec 64,452 2497 378 0.00583 0.00264 0.000664
Devoll 32,070 1462 211 0.00653 0.00473 0.001191
Kukes 72,875 1844 351 0.0048 0.00284 0.000714
Kukes 39,510 1113 190 0.00479 0.00524 0.001317
Has 13,247 270 84 0.00629 0.0118 0.00297
Tropoje 20,118 886 198 0.00973 0.00504 0.001272
Lezhe 148,395 4080 173 0.00117 0.00576 0.001442
Lezhe 72,257 2617 133 0.00184 0.0075 0.001879
Mirdite 26,750 778 67 0.00249 0.0148 0.003708
Kurbin 49,389 2192 298 0.006 0.00335 0.000842
Shkoder 239,312 7350 658 0.00275 0.00152 0.000381
Shkoder 179,065 6642 637 0.00355 0.00157 0.000394
Puke 21,712 892 123 0.00562 0.0081 0.002036
Malesi madhe 38,535 1235 260 0.00671 0.00384 0.000965
Tirane 712,068 19,057 997 0.00141 0.001 0.000251
Tirane 631,027 18,415 1048 0.00167 0.00095 0.000239
Kavaje 81,041 2743 282 0.00347 0.00354 0.000889
Vlore 277,885 7335 913 0.00328 0.00109 0.000275
Sarande 74,963 3534 470 0.00623 0.00213 0.000535
Delvine 23,788 1504 339 0.01404 0.00295 0.000747
Vlore 179,134 5327 694 0.00386 0.00144 0.000362


C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 5
I. Mikerezi et al.

Table 2 Comparison of isonymy parameters in nine European countries, in five South-American countries, in the United States and Texas,
and in Yakutia. Overall, 122 million surnames were analysed.

Sample size Surnames Isolation Type-token


Country (SS, millions) (S) (average) by distance (SS/S)

Europe
Austria 1 140,766 854 0.59 7.1
Albania 3.0 37,184 123 0.71 82
Belgium 1.1 137,442 997 0.74 8
France 6 495,104 1615 0.69 12.1
Germany 5.2 462,526 1596 0.51 11.2
Holland 2.4 126,485 787 0.46 19
Italy 5.1 215,623 1236 0.61 23.7
Switzerland1 1.7 166,116 891 0.72 10.2
Spain 3.6
Paternal 94,886 134 0.21 38
Maternal 110,034 144 0.26 33
Asia
Yakutia 0.5 44,625 107 0.69 11.1
North America
United States 18 899,585 1366 0.24 20
Texas 3.6 235,740 734 0.42 15.3
South America
Argentina3 22.6 414,441 422 0.47 54.5
Venezuela2 3.9 68,665 122 0.78 56.8
Bolivia4 23.2 174,922 122 0.5 144.6
Paraguay3 4.8 39,047 108 0.42 122.9
1
Cantons.
2
States.
3
Districts.
4
Provinces.

area and population are considered. Very properly in the case In the analysis, is significantly and negatively correlated
of Albania, the difference constitutes the Prefecture Effect, (r = 0.16) with latitude, possibly due to the average higher
identified for FST by Nei and Imaizumi (1966), in Japan, and population density of southern communes. So, the largest
so named by Scapoli et al. (2007). Nei and Imaizumi observed values of (the inverse of isonymy) were seen in the large
that, for the same area and population, small subdivisions have towns, which are also capitals of prefectures. Highest s for
larger FST , and larger subdivisions have smaller FST . In their communes were 1245 in the commune of Korce , 1222 in
study, the effect was seen in towns and in the Japanese prefec- Tirana, 990 in Durres, 748 in Vlore, and 720 in Shkoder.
tures where the towns were located; hence the name. It could These large communes give the name to the prefectures where
also be named a geographic scale effect that intervenes in they are located. The lowest values observed in communes
many phenomena since it is just a question of heterogeneity were = 7 in Sheze, in the prefecture of Elbasan, = 10
increasing with population size. Of course, the prefecture ef- in Hysgjokaj and = 11 in Ballagat, both communes in
fect is visible both on FST and . It appears that Albania is the prefecture of Fier, and = 12 in Shtiqen and = 13
no exception, and, since is inversely related with FST, the in Surroj, both in Kukes. These communes are located in
sequence mountainous areas and have a small population.
FST Prefecture < FST District < FST Commune
is respected. Isolation by distance
In Albania, the lowest levels of random inbreeding, indi- We studied isolation by distance through the correlation
cated by FST , are expected and observed in the highly popu- of geographic with surname distances at the prefecture,
lated areas of the central part of the country, the area around district and commune levels. We found that Euclidean,
the capital Tirana. Neis and Laskers distance between the 12 prefectures were

6 Annals of Human Genetics (2013) 


C 2013 Blackwell Publishing Ltd/University College London
Surnames in Albania

Figure 2 Variation of Laskers distance between prefectures Figure 3 Variation of Laskers distance (s.d.) over kilometres
with geographic linear distance. between 321 communes in Albania.

considerably correlated with linear geographic distance, with


r = 0.709 0.062, r = 0.560 0.079 and r = 0.621 0.082,
respectively. The same tendency was observed between the 36
districts, although the correlations in this case were smaller,
r = 0.581 0.029, r = 0.543 0.033 and r = 0.584 0.030,
respectively. Similarly, between communes, we observed 0.47
0.008, 0.37 0.011, 0.44 0.011 for Euclidean, Neis
and Laskers. As an example, the variation of Laskers distance
between prefectures is given in Figure 2 (see Fig. S2 for the
distribution of Laskers distances between districts and Fig. S3
for that of Laskers distance between communes). Given the
high correlation between the three measures of distance (for
prefectures, r[NeiEuclidean] = 0.85 0.03; r[NeiLasker] = 0.74
0.06 and r[EuclideanLasker] = 0.65 0.08), for this analysis,
we used mainly Laskers distance. Figure 4 Exponential decay of random kinship (1/2 s.d. to
The signal extracted from the scatter diagram of Laskers avoid intersection of the lower one with the abscissa) in Albania
distance over kilometres for communes is given in Figure 3. over geographic distance. Pairwise distances between
Linearity seems dominant, in Albania a clear tendency toward communes.
an asymptote is not observed, as it was in Spain, Bolivia and
Chile (Rodriguez-Larralde et al., 2003, 2011; Barrai et al., Kinship
2012) where the relation between isonymic and geographic We plotted kinship between communes as previously defined
distance flattens at large distances. In Albania, there is a sharp as a function of geographic distance (Fig. 4). Note that at the
increase of Laskers distance up to 120 km, which gives in- commune level several pairs of communes (33 per thousand)
dication of isolation and drift below that distance. After that, did not share surnames.
the increase in isonymic distance becomes minor, possibly The decrease of kinship with distance is significantly ex-
indicating the effect of internal migration. The signal for ponential, as predicted by Malecot (1955), (see also Kimura,
Euclidean and Neis distance is given in Figures S4 and S5, 1960). Specifically, the exponential decay should be charac-
respectively. Note the rapid rise of Euclidean distance toward teristic of structures more linear than Albania, for example, as
the asymptote, due to the sensitivity of this distance to the observed by us in Chile. However, there is considerable and
change of surnames and of their frequency with increasing significant agreement between Malecot theory and kinship
geographic distance. decay in Albania. Then, the Malecot model is very strong


C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 7
I. Mikerezi et al.

and, possibly due to the large number of pairwise distances the South of the country. One district in the South, Tepelene,
we had available, it is also applicable to a geographic structure, clusters with a Central-Northern belt of the seven districts of
which, like Albania, is elongated from North to South but Durres, Kruje, Tirane, Mat, Bulqize, Diber and Kukes.
is poorly linear. We were not surprised when we observed A second central main cluster, south of the former, includes
the considerable agreement between the Malecot model and in an East-West belt the districts of Kavaje, Lushnje, Peqin,
kinship decay in Chile, since this latter country is practically Elbasan, Gramsh and Librazhd.
linear. Still, the agreement between the model and the ob- Then, comes a Southern group of districts: Kucove, Berat,
served decay of kinship over kilometric distance in Albania, Skrapar, Korce , Pogradec and Devoll. All these are adjacent
which is elongated but far from linear, is indicative of a gen- also geographically. However, we underline that the cluster-
eral validity of the model although originally it was derived ing of Malesi e Madhe, Tropoje and Has in the North, with
only for a linear structure. the Vlore cluster in the South, might indicate injection, be-
tween North and South of Albania, of eastern groups from
Macedonia toward the Adriatic (Fig. S9).
Relations between the Administrative Sections From the projection, some other minor but relevant points
of Albania emerge, which complement the clustering. In particular, the
Tirane district stands at the centre of the bidimensional pro-
In order to obtain a general idea on the movements of popu- jection, with Durres. This might indicate that these districts,
lation groups in Albania, we conducted MDSs and PCAs on which together comprise almost one quarter of the Albanian
the matrix of Laskers distances between prefectures, between population, possess most of the surnames of the nation.
districts and between communes. We report here and as sup- Malesi-e-Madhe in Shkoder, and Mallakaster in Fier are
plementary material some of the results of these analyses. marginal both on the projection and in Albanian geography,
bordering, respectively, Montenegro, Kosovo at North and
Prefectures the limit of the Toske dialect in the South.
The MDS projection on the first two dimensions of the ma- A visual indication of the isonymic proximity of districts is
trix between prefectures (Fig. S6) differentiates a few clusters, given by the maps of Figure 5 where the similarity of districts
which correspond to groups of neighbouring prefectures. In is indicated by the similar intensity of the same colour. It is
the resulting dendrogram (Fig. S7), a first large cluster com- appropriate at this point to indicate that recently new methods
posed mainly of the central prefectures is observed: Tirane, of identifying spatial concentration of surnames have been de-
Durres, Elbasan, Diber, Fier and Berat. These last two form veloped (e.g. Longley et al., 2011; Chesire & Longley, 2012),
a subcluster within this cluster. Then, three prefectures in which give specific examples on various ways of clustering and
the South-East and the extreme South, namely Korce , Vlore representing geographical dimensions of surname frequency
and Gjirokaster, form the next cluster. Finally, two prefectures data. Most interesting seem the developments which include
of the North cluster together, Shkoder and Lezhe, whereas forenames to detect ethnicity of groups (Mateos et al., 2011).
Kukes represents an exception because, despite being a moun- This adds a further dimension to isonymy studies, which needs
tainous prefecture of the North, clusters together with the to be explored.
Central prefectures, possibly due to the emigration from the
poorer areas toward the highly populated and richer areas
around the capital Tirana. Communes
From the MDS projection in Figure S6, some other minor We found that, only at the commune level, there were 157
but relevant points emerge, which complement the clustering pairs of communes out of 51,360, which did not share sur-
of prefectures. In particular, Tirane, Durres and Elbasan stand names. Out of these 157 pairs, 49 included the commune of
alone at the centre of the bidimensional projection, removed Liqenas in Korce , which has a mainly Macedonian popula-
from the other prefectures. Vlore is marginal as is Korce . tion. Also, 34 pairs included the commune of Lure in Diber,
but we did not find a good reason for this last preference.
Of course, there are various reasons why in Albania this ab-
Districts sence of the same surname in small communes may occur.
The projection on the first two dimensions of the MDS tends We believe that, among others, one reason is to be found
to differentiate several clusters, which correspond fairly well in the complexity of the Albanian alphabet, which often re-
to neighbouring districts (Fig. S8). sults in the same name being written differently in different
In the dendrogram (Fig. S9), the districts of Malesi e communes. However, there is also some effect of distance on
Madhe, Tropoje and Has, at the Northern border with Mon- the phenomenon. The average geographic distance between
tenegro, cluster with Fier, Mallakaster and Vlore, which are in the 157 pairs having infinite Laskers and Neis distance is

8 Annals of Human Genetics (2013) 


C 2013 Blackwell Publishing Ltd/University College London
Surnames in Albania

Figure 5 Projection of Laskers matrix of surname distances on districts in Albania by mapping (A) the first three
PCAs factors (I: Factor 1 = 42.8%; II: Factor 2 = 26.9%; III: Factor 3 = 11.5%) (B) the first three MDSs dimensions
(I: Dimension 1; II: Dimension 2; III: Dimension 3. Stress 11.2%).

128.9 14.7 km. The average distance for the other 51,203 for Nei (refer again to Fig. S3 for Laskers distance between
pairs is 95.9 0.06 km, and the difference is significant (t[oo] = communes).
8.568, P << 0.0001). We bypassed the problem posed in the As noted briefly above, we estimated in the commune of
multivariate analysis of the distance matrices, by the elements Korce , the capital of the homonymous prefecture, the highest
of infinite value, by substituting to the 157 infinite isonymic value of in the Country (1245). Relatively high and low
distances, the nearest maximum observed. In this way, we met estimates of inbreeding are also observed in several communes
no complexities in the subsequent analysis of the distance ma- of the central area in the Tirana region. This might explain
trices of Lasker and Nei. It is important to note that if the the position of this prefecture relative to the other groups and
157 infinite distances are excluded, the correlations for com- might indicate recent immigration toward the main urban
munes rise from 0.44 to 0.47 for Lasker, and from 037 to 0.39 area of Albania. Low (and high FST ) are observed in the


C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 9
I. Mikerezi et al.

communes of Elbasan (one commune with = 7), Fier (one be ordered in the North-South direction. However, although
commune with = 10 and a second one with = 11), and this component indicates movement in the North-South di-
Kukes (one communes with = 12 and one with = 13). rection, the sense of movement cannot be detected from it,
We do not present here either the projection of the first unless we accept that the highest deviations are the most re-
two dimensions of MDS for the 321 321 matrix of the cent ones.
communes nor the dendrogram derived from it. Both are, Overall, the three components account for 81.2% of the
however, given as Figures S10 and S11. Since the projec- surname variation as obtained from Laskers distance matrix.
tions of the individual names of 321 elements are illegible, The mappings of the first three dimensions of the MDS
we decided to label with different symbols the communes seem to us compatible with those obtained from the PCA.
North of the River Shkumbin in central Elbasan (152 and The indication of possible East-West movement seems clear
169, respectively) to detect whether the two main groups of enough for the first and second dimension, and less so for
points depicted (Fig. S10) in the projection contain a major- the third. So, this isonymic structure of Albania seems to
ity of communes where Gege or Toske is spoken. In fact, the be mainly due to ancient migration from the East toward
subdivision is sharp, a vast majority the Gege-speaking com- the coast, with radiation toward the North and South, with
munes cluster together as do those speaking Toske. Then, the subsequent isolation and drift, with drift and short-range mi-
two main groups identified through surname distances are gration playing a major role in the generation of the present
highly correlated with the two linguistic areas of Albania, the geographical variation of surnames.
Gege area in the North and the Toske area in the South.
We used the same technique to visualise the clusters in
Conclusions
the dendrogram, putting the labels G and T at the endpoints
of the graph (Fig. S11). The resulting clusters correlate with The methodology described in this paper was used to analyze
latitude, but the North-South distribution of communes is the isonymic structure of several South American countries
not as clear as in the projection from the MDS. (Rodriguez-Larralde et al., 2000, 2011; Dipierri et al., 2005,
2011; Barrai et al., 2012). In these countries, 4 (Venezuela),
24 (Argentina), 23 (Bolivia), 4.5 (Paraguay) and 16.5 (Chile)
Mapping of the first three components of Laskers matrix million surnames from the registers of electors were used.
The structures revealed by the MDSs and the dendrograms In European countries and in the United States, we anal-
are only partially indicative of the possible movements of the ysed surnames of telephone users (Barrai et al., 2001; Scapoli
population, therefore, to have a general idea of the direc- et al., 2005, 2007; Rodriguez-Larralde et al., 2007). In thinly
tion, if any, of settlements in Albania, we mapped on the populated Siberia, we used half a million surnames (Tarskaya
nation (following Menozzi et al., 1978) the first three com- et al., 2009). The average value of for all the cities (or states,
ponents of the matrix of Laskers distance, obtained from a in the case of Venezuela and the United States, or districts,
PCA and from the MDS. We provide the PCA components in the case of Argentina and Paraguay), and the isolation by
because the relative importance of each component is given by distance measured by the correlation between isonymic and
the corresponding eigenvalue, while the MDS provides the geographic distances, are given in Table 2 for the countries
value of the stress for a judgement of the overall fitting on the studied up to now. Several features emerge from the compar-
three dimensions. The resulting maps are given in Figure 5 isons reported in Table 2. First, the general similarity among
(A for PCA and B for MDS, respectively). European nations in profusion of surnames as measured by ,
The variation of the first component, which accounts for and for isolation by distance, as measured by the linear corre-
almost half of the variability (42.8%) in the North-South lation. Secondly, the relatively small value of in Venezuela,
direction, indicates movement from the centre of the coun- Bolivia, Paraguay, Spain, Chile and now Albania; and thirdly,
try toward North and South. This might mean that from a the practical absence of isolation by distance in the United
chronological point of view, immigration was in the East- States, excluding bilingual Texas (Rodriguez-Larralde et al.,
West direction from Macedonia, establishing a centre of high 2007). In Albania, the average number of persons having the
density of migrants, which subsequently moved North and same surname (measured by the ratio Sample Size/Surnames,
South. The third component (11.5%) gives the same indi- given as the index SS/S in Table 2, is more similar (82) to
cation, although with minor intensity. Then, the sense of that of Argentina, Bolivia and Venezuela than to that of other
movement may be hypothesised from the East toward the European countries. It may be of some interest to compare
Adriatic Coast, since the entry of surnames from the sea in our Table 2 with King and Joblings (2009) table 1. There,
significant numbers is unlikely. they give the mean number of carriers per surname in 5538
It appears that only the second component (26.9%) is some- households in 27 countries. Where applicable, their results are
what directional; the deviations from the second axis appear to consistent with ours.

10 Annals of Human Genetics (2013) 


C 2013 Blackwell Publishing Ltd/University College London
Surnames in Albania

Albania is the only European country in which we had Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. &
near-census data (persons below 18 years of age were not in- Rodriguez-Larralde, A. (1996) Isonymy and the genetic struc-
cluded, our data being those of electors), as we had in South ture of Switzerland. I: The distributions of surnames. Ann Hum
Biol 23, 431455.
America. The ratio in countries where we had only the sur- Barrai, I., Rodriguez-Larralde, A., Mamolini, E. & Scapoli, C.
names of telephone users is about 25% of the ratio observed (2000) Elements of the surname structure of Austria. Ann Hum
in countries where we had census data. We would like to label Biol 26, 115.
this as a census effect, but at present it is more prudent to Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Manni, F. &
attribute the phenomenon to a bias of the telephone direc- Scapoli, C. (2001) Elements of the surname structure of the USA.
Am J Phys Anthropol 114, 109123.
tory. However, according to Lasker (1985), this should not Barrai, I., Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Acevedo,
be a major problem in countries with high telephone pene- N., Mamolini, E., Sandri, M., Carrieri, A. & Scapoli, C.
tration rates, since telephone lines are a good sample measure (2012) Surnames in Chile. A study of the population of Chile
of households in the country. In this context, 25% of the to- through isonymy. Am J Phys Anthropol 147, 380388. doi:
tal population simply reflects four people per telephone line, 10.1002/ajpa.22000.
Bidollari, C . (2010) Onomastic investigations. In Albanian. Tirane:
which may well approach the average household size. In any Botimet Kumi Editor.
case, we will wait to explore the effect further when we shall Cavalli-Sforza, L. L. & Edwards, A. W. F. (1967) Phylogenetic analysis
have available more data from national censuses, because for models and estimation procedures. Am J Hum Genet 19, 233257.
the time being, barring Yakutia and Albania, the effect is con- Chesire, J. A. & Longley, P. A. (2012) Identifying spatial concentra-
founded with the small number of different single surnames tions of surnames. Int J Geogr Inform Sci 26, 309325.
Crow, J. F. & Mange, A. (1965) Measurements of inbreeding from
in the Spanish language. the frequency of marriages between persons of the same surname.
In Albania, Gege is spoken in the northern prefectures and Eugen Q 12, 199203.
Toske in the southern ones. In Vlore and in Gjirokaster both Dipierri, J. E., Alfaro, E., Scapoli, C., Mamolini, E., Rodriguez-
Greek and Toske are spoken. It is interesting to note that in Larralde, A. & Barrai, I. (2005) Surnames in Argentina. A popu-
the map projection of MDS analysis of communes, a vast ma- lation study through isonymy. Am J Phys Anthropol 128, 199209.
Dipierri, J. E., Rodriguez-Larralde, A., Alfaro, E. L., Scapoli, C.,
jority of the Gege-speaking communes cluster together, as do Mamolini, E., Salvatorelli, G., De Lorenzi, S., Sandri, M., Car-
those speaking Toske. Thus, the two main swarms identified rieri, A. & Barrai, I. (2011) Surnames in Paraguay: A study of
through surname distances are highly correlated with the two the population of Paraguay through isonymy. Ann Hum Genet 75,
linguistic areas of Albania; the Gege area in the North and 678687. doi: 10.1111/j.1469-1809.2011.00676.x.
the Toske area in the South. Fox, W. R. & Lasker, G. W. (1983) The distribution of surname
frequencies. Int Stat Rev 51, 8187.
In this analysis, all inbreeding estimates were lower (and Kimura, M. (1960) Outline of population genetics (in Japanese). Tokyo:
higher) in the highly populated central area, in the Tirana Baifukan.
region. At present, most internal migration seems to take King, T. E. & Jobling, M. A. (2009) Whats in a name? Y chro-
place toward the capital and the other main towns. Con- mosomes, surnames and the genetic genealogy revolution. Trends
sequently, for the time being, we may conclude that cur- Genet 25(8), 351360.
Lasker, G. W. (1985) Surnames and genetic structure. Cambridge: Cam-
rently the population structure of this country is the result of bridge University Press.
the joint action of directional and short-range migration and Longley, P. A., Chesire, J. A. & Mateos, P. (2011) Creating a regional
drift, with directional migration dominating over drift at short geography of Britain through the spatial analysis of surnames.
distances, as suggested by the rapid rise of Laskers over geo- Geoforum 42, 506516.
graphic distance below 120 km and by its flattening above that Malecot, G. (1955) Decrease of relationship with distance. Cold
Spring Harbour Symp 20, 5253.
distance. Mantel, N. (1967) The detection of disease clustering and a gener-
alized regression approach. Cancer Res 27, 209220.
Mateos, P., Longley, P. A. & OSullivan, D. (2011) Ethnicity and
Acknowledgements population structure in personal naming networks. PloS ONE 6,
e22943. doi:10.1371/journal.pone.0022943.
The authors are grateful to the CEC of Albania who conceded Menozzi, P., Piazza, A. & Cavalli-Sforza, L. L. (1978) Synthetic
the data. The authors are also particularly grateful to both maps of human gene frequencies in Europeans. Science 201, 786
792.
Referees who gave valuable advice. The work was supported Mikerezi, I., Susanne, C., Bajrami, Z. & Kume, K. (1995) Differenti-
by grants of the University of Ferrara to Chiara Scapoli. ation of Albanian human populations and their relationships with
Balkanic ethnic groups according to gene frequencies at ABO,
MN and Rhesus loci. IUAES International Congress, April 2021,
1995, Torino, Italia, p. 32.
References Mikerezi, I., Pizzetti, P., Lucchetti, E. & Ekonomi, M. (2003)
Adamic, L. A. & Huberman, B. A. (2002) Zipf law and the Internet. Isonymy and the genetic structure of Albanian population. Coll
Glottometrics 3, 143150. Antropol 27, 507514.


C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 11
I. Mikerezi et al.

Nei, M. (1973) The theory and estimation of genetic distance. In: Supporting Information
Genetic structure of populations (ed. N. E. Morton). Hawaii: Hawaii
University Press. Additional supporting information may be found in the online
Nei, M. & Imaizumi, J. (1966) Genetic structure of human popula- version of this article:
tions. I. Local differentiation of blood groups gene frequencies in
Japan. Heredity 21, 936. Table S1 Distribution of isonymy parameters.
Relethford, J. H. (1988) Estimation of kinship and genetic distance
from surnames. Hum Biol 60, 475492. Table S2 The 100 most frequent surnames in Albania.
Rodriguez-Larralde, A., Barrai, I. & Alfonzo, J. C. (1993) Isonymy
structure of four Venezuelan states. Ann Hum Biol 20, 131145. Table S3 The most frequent names of Arabic origin in
Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C., Albania.
Mamolini, E. & Barrai, I. (1998) Isonymy and the genetic struc-
ture of Switzerland. II. Isolation by distance. Ann Hum Biol 25, Table S4 Surnames with the prefix Papa of clear Greek
533540. origin.
Rodriguez-Larralde, A., Morales, J. & Barrai, I. (2000) Surname
frequency and the isonymy structure of Venezuela. Am J Hum Figure S1 Variation of the number of occurrences in 3 mil-
Biol 12, 352362. lion surnames in Albania.
Rodriguez-Larralde, A., Gonzalez-Martin, J., Scapoli, C. & Barrai,
I. (2003) The names of Spain: A study of the isonymy structure Figure S2 Variation of Laskers distance between 36 districts
of Spain. Am J Phys Anthropol 121, 280292. in Albania.
Rodriguez-Larralde, A., Scapoli, C., Mamolini E. & Barrai, I. (2007)
Surnames in Texas: A population study through isonymy. Hum Figure S3 Variation of Laskers distance between 321 com-
Biol 79, 215239. munes in Albania.
Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Scapoli, C.,
Mamolini, E., Salvatorelli, G., De Lorenzi, S., Carrieri, A. Figure S4 Variation of Euclidean with geographic distance.
& Barrai, I. (2011) Surnames in Bolivia: A population study
through isonymy. Am J Phys Anthropol 144, 177184. doi: Figure S5 Variation of Neis with geographic distance.
10.1002/ajpa.21379. Figure S6 MDS on the matrix of Laskers distances between
Scapoli, C., Goebl, H., Sobota, S., Mamolini, E., Rodriguez-
Larralde, A. & Barrai, I. (2005) Surnames and dialects in France: Prefectures.
Population structure and cultural transmission. J Theor Biology 237, Figure S7 Dendrogram of Albania prefectures.
7586.
Scapoli, C., Mamolini, E., Carrieri, A., Rodriguez-Larralde, A. & Figure S8 MDS of Laskers distance matrix between districts.
Barrai, I. (2007) Surnames in Western Europe: A comparison of
the subcontinental populations through isonymy. Theor Popul Biol Figure S9 Dendrogram of districts from the matrix of Laskers
71, 3748. distance.
Smouse, P. E., Long, J. C. & Sokal, R. R. (1986) Multiple re-
gression and correlation extensions of the Mantel test of matrix Figure S10 Projection of the 321 communes of Albania on
correspondence. Syst Zool 35, 627632. the first two dimensions of the matrix of Laskers distances.
Susanne, C., Bajrami, Z., Kume, K. & Mikerezi, I. (1996) Gene
differentiation at the ABO, MN and Rhesus loci among Albanians Figure S11 Dendrogram of communes.
and their relation with other Balkan populations. Gene Geogr 10,
3136. As a service to our authors and readers, this journal provides
Tarskaya, L., Elchinova, G. I., Scapoli, C., Mamolini, E., Carrieri, A. supporting information supplied by the authors. Such mate-
Rodriguez-Larralde, A. & Barrai, I. (2009) Surnames in Siberia. rials are peer-reviewed and may be re-organised for online
A study of the population of Yakutia through isonymy. Am J Phys delivery, but are not copy-edited or typeset. Technical sup-
Anthropol 138, 190198.
Ward, J. H. (1963) Hierarchical grouping to optimize an objective port issues arising from supporting information (other than
function. J Am Statist Assoc 58, 236244. missing files) should be addressed to the authors.
Wright, S. (1951) The genetic structure of populations. Ann Eugen
15, 324354.
Zipf, G. K. (1935) The psychobiology of language. Boston, MA: Received: 9 August 2012
Houghton-Mifflin. Accepted: 18 November 2012

12 Annals of Human Genetics (2013) 


C 2013 Blackwell Publishing Ltd/University College London

You might also like