You are on page 1of 7

An R Function for the Blau Index of Diversity

In diversity research, one is often interested in how an individual feature is distributed among the members of a group. In other words, one is interested in how diverse a group is with regard to that feature. If the particular feature can be expressed in a metric way, e.g. age or organizational tenure, researchers use measures of dispersion for quantifying the diversity of a group with regard to that feature. For example, the standard deviation of the average age of group members can be employed to indicate the age diversity of a group. If researchers wish to quantify the diversity of a group with regard to a nominal feature, such as ethnicity, gender, or education, they usually employ the Blau Index (Blau, 1977). The Blau Index is calculated by where p is the proportion of group members in a given category and i is the number of different categories of the feature across all groups. If a group is homogeneous with regard to the feature in question, i.e., if all group members have the same nationality, the Blau Index of the group for nationality is 0. If all members of the group have a different nationality, the Blau Index of that group for nationality approaches 1. The maximum Blau Index for a feature in a given data set depends on the number of categories of that feature in the data set. A number of studies have linked the Blau Index of (management-) teams to team processes and team outcomes (e.g., Bantel & Jackson, 1989; Richard, Barnett, Dwyer, & Chadwick, 2004; Chandler, Honig, & Wiklund, 2005; Pitts, 2005). Threfore, I also wanted to include the Blau Index for various features in the analysis of the data I obtained in an attempt to replicate and extend a study by Homan, van Knippenberg, van Kleef, & De Dreu (2007). In doing so, I was unable to locate an R function for calculating the Blau Index. I therefore wrote my own and thought that others might also find it useful. The function takes two arguments: A numeric vector, groupID, denoting the group of every person/participant in the data

set. A second vector, feat, that can be either numeric or string, denoting the expression of the feature for each person/participant in the data set. The function returns a vector of length = number of groups with the Blau Index for each group. Example: groupid <- c(1,1,1,2,2,2,2) feature <- c("male", "male", "male", "female", "female", "male", "male") blau.index(groupid, feature) [1] 0.0 0.5 Here is the code:
blau.index <- function(groupid, feat){ blau.index <- rep(0, length(levels(as.factor(groupid)))) if (is.numeric(feat)) { # if the feature is denoted as a numeric ordinal variable for (i in 1:length(levels(as.factor(groupid)))){ for (j in 1:length(levels(as.factor(feat)))){ blau.index[i] <- blau.index[i] + ((sum(feat[groupid == i & feat == j])/j)/ length(feat[groupid == i]))^2 } } } else { # if the feature is denoted as as strings number.of.features <- length(levels(as.factor(feat))) feat.num <- rep(NA, times = length(as.factor(feat))) for (i in 1:number.of.features){ feat.num[feat == levels(as.factor(feat))[i]] <- i

feat.num[is.na(feat.num)] <- (number.of.features + 1) } for (i in 1:length(levels(as.factor(groupid)))){ for (j in 1:length(levels(as.factor(feat.num)))){ blau.index[i] <- blau.index[i] + ((sum(feat.num[groupid == i & feat.num == j])/j)/ length(feat.num[groupid == i]))^2 } } } blau.index <- (1 - blau.index) return(blau.index) }

I would appreciate suggestions for improvements. References Blau, P. M. (1977). Inequality and heterogeneity. New York, NY: Free Press. Bantel, K., & Jackson, S. (1989). Top management and innovations in banking: does the composition of the top team make a difference? Strategic Management Journal, 10, 107124. Chandler, G. N., Honig, B., & Wiklund, J. (2005). Antecedents, moderators, and performance consequences of membership change in new venture teams. Journal of Business Venturing, 20, 705725. Homan, A. C., van Knippenberg, D., Kleef, G. A. van, & De Dreu, C. K. W. (2007). Bridging faultlines by valuing diversity: Diversity beliefs, information elaboration, and performance in diverse work groups. Journal of Applied Psychology, 92(5), 11891199. Pitts, D. (2005). Diversity, representation, and performance: Evidence about race and ethnicity in public organizations. Journal of Public Administration Research and

Theory, 15, 615631. Richard, O., Barnett, T., Dwyer, S., & Chadwick, K. (2004). Cultural diversity in management, firm performance, and the moderating role of entrepreneurial orientation dimensions. Academy of Management Journal, 47, 255266. Labels: R, research, social psychology, statistics
POSTED BY BERTOLT AT 2 COMMENTS:

22: 02

Stephan Kolassa said... Here you go: blau <- function (features) { 1-sum((table(features)/length(features))^2) } by(data=feature,INDICES=groupid,FUN=blau) The first line defines a function "blau", which calculates the Blau index for a single group, by tabulating the feature using table(), getting the relative frequencies of the features by dividing by the length of the features vector, squaring and subtracting from one. The second line uses by() to apply blau() separately to the features as indexed by the groupid vector. We even get a nice tabulated output. by() is often quite helpful... This solution also seems to be faster: nn <- 1000000 set.seed(2009) groupid <- sample(seq(1,10),size=nn, replace=TRUE) feature <- sample(c("male","female"),size=nn, replace=TRUE) system.time(blau.index(groupid, feature))

system.time(by(data=feature,INDICES=groupid,FUN=blau)) yields 5.09/.27/5.39 for blau.index() and .67/.05/.72 for by(blau). Good luck with your diversity research! 9: 57 AM longge said... Along with being stylish to a great extent UGG are also quite comfortable. Though these kinds of UGG Boots go well any kind of outfit yet they are more perfectly suitable with tight jeans. 4: 36 AM P OS T A C OMMENT << Home
ABOUT ME

B E RT O LT
ZRICH, SWITZERLAND VIEW MY COMPLETE PROFILE

MY CURRENT LOCATION

ANDERE BLOGS

my new life in switzerland (in German)

lefreaque - our music blog my brother hansblond tyskjohan

MY DEL.ICIO.US TAGS

apple blog latex mac osx

psychology

r research software statistics technology web2.0

I am bertolt on Delicious Add me to your network <a href="http://www.qype.com/people/myo">Meine Tags</a>

PREVIOUS POSTS

How to make a list of own publications in LaTeX in... The Finnish Husband My PhD thesis was published today despite copyrigh... Starchitecture Re: Grok or: Information, Knowledge, and Mental Mo... Obtaining the same ANOVA results in R as in SPSS -... Beautiful Correlation Tables in R Parachute use to prevent death and major trauma re... More Beautiful Error Bars in R

Beautiful Error Bars in R

< a href="http://s37.sitemeter.com/stats.asp?site=s37myowelt" target="_top">< img src="http://s37.sitemeter.com/meter.asp?site=s37myowelt" alt="Site Meter" border="0"/></a>

You might also like