Professional Documents
Culture Documents
Learning Goals
Why we use Graphs? Basic Graphs for Categorical Data
How to answer these questions? Probably the first thing you do, is count the observations for each car make and note them down Once you note them down in a tabular form, that makes it a frequency table Frequency Table for categorical data is a table that displays the possible categories along with the associated frequencies or relative frequencies.
Applied Statistics and Computing Lab
3
At one glance- the answers are there! Buick is the most common brand in that showroom and Pontiac is the least common brand A frequency table is the simplest representation of data in a tabular form This same data can be represented pictorially in a number of ways!
Applied Statistics and Computing Lab
4
Bar Chart
Bar chart is a graph of the frequency distribution of categorical data. Each category in the frequency distribution is represented by a bar or rectangle. Area of each bar is proportional to the corresponding frequency. Bar chart maybe vertical or horizontal. The following is vertical
Inferences
For policy reasons, one is interested in the composition of population in Andhra Pradesh. From a bar chart, we can get the most frequently and infrequently occurring categories- Rural male has the highest occurrence, urban female the lowest. However, one may also be interested in the relative share of each category, rather than the absolute figure Visually, from the bar chart rural male and rural female seems almost equal. So does urban male and female To analyze their relative share we look at pie chart
7
Pie Chart
A circle is used to represent the whole data set Slices of the pie represent possible categories Area of the slice for a particular category is proportional to the corresponding relative frequency When to use: categorical data with a relatively small number of categories In case of many categories, merge a few categories into one Most useful for illustrating proportions of the whole dataset for various categories Criticism: Not an effective visual display if one doesnt specify the percentages
8
Most marked increase in population in the category of urban female, followed closely by urban male Very slight increase in the categories rural male and rural female Possible cause- rural to urban migration- needs investigation Since increase in each category from 2001 to 2011, hence total increase in population from 2001 to 2011 However, cannot visualize exactly how much the population has increased from 2001 to 2011 Also, no information about the relative contribution of each category to the total For this, use segmented or stacked bar plot
10
Inferences
Gives an idea about the total increase in population from 2001 to 2011 Also, the relative share of each category in each year
11
Rural male is the most dominant category, followed by rural female From 2001 to 2011, there has been a relative increase in the urban male and urban female population (relative to rural male and rural female)
12
R-Codes
#Creating Data APPopulation = cbind(c(28219760,28092028,14290121,14063624), c(27937204,27463863,10590209,10218731)) rownames(APPopulation) = c("RuralMale","RuralFemale","UrbanMale","UrbanFemale") colnames(APPopulation) = c("2011","2001") #barplot colors=c("red", "bisque", "darkslategray", "violet") barplot(APPopulation[,"2011"]/1000000,col=colors) title(main="Barplot of AP Population in 2011 (in millions)") # Multiple Bar Graph: A = matrix( c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the data elements nrow=2, # number of rows ncol=4, # number of columns byrow = TRUE) # fill matrix by rows colors=c("red", "bisque") barplot(A/1000000,names.arg =rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=TRUE,main="Distribution of population by category",xlab="Categories", ylab="population, in millions",ylim=c(0,80),col=colors)
14
R-Codes (Continued)
# Segmented bar graph (yearwise) colors=c("red", "bisque", "darkslategray", "violet","red","yellow") barplot(APPopulation/1000000, main="Distribution of population by category yearwise", xlab="Year", ylab="population, in millions",col=colors, legend = rownames(APPopulation)) # Segmented bar graph (categorywise) colors=c("red", "bisque") A = matrix( c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the data elements nrow=2, ncol=4, byrow = TRUE) barplot(A/1000000,names.arg =rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=FALSE,main="Distribution of population by category",xlab="Categories", ylab="population, in millions",ylim=c(0,90),col=colors) # Pie Chart colors=c("red", "bisque", "darkslategray", "violet") slices <- c(27937204,27463863,10590209,10218731) lbls <- c("RuralMale","RuralFemale","UrbanMale","UrbanFemale") pct <- round(slices/sum(slices)*100) lbls <- paste(lbls, pct) # add percents to labels lbls <- paste(lbls,"%",sep="") # ad % to labels pie(slices,labels = lbls, col=rainbow(length(lbls)), main="Pie Chart of APPopulation in 2011")
15
Thank you
16