Managerial Data Analysis

Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
CASE STUDY
Record 40 60 data for two statistical variables (X and Y) at your choice:
Admision - sesion July 2011 "Alexandru Ioan Cuza" Police Academy List of admitted candidates Police - Law
Xindependent variable Selection unit (county) AG AG AG AR AG AG DJ AG AG CS BR AG SV BT AG BT BZ AG GJ PH VL NUME PRENUME GHINESCU ANDREEA-FLORENTINA MORARU ANCA-NICOLETA ALEXE SIDONIA-CRESCENZIA RADU LUIZA-CLAUDIA ARSENE STEFAN-ALEXANDRU FLORESCU IONUT-ANDREI MARCULESCU ROXANA TRACHE CAMELIA-ELENA CALUGAROIU LIVIU-MARIAN TEODORESCU SILVIU-PETRU STOILESCU FLORIAN ANITA ALEXANDRU DUMITRAS CORINA-LAVINIA ISTRATE DANIELA-ANDREEA CIOBANU DANIEL-VALENTIN GABOR ANDREEA-CATALINA OPREA ALEXANDRU-ION IORDACHESCU MIHAI-CIPRIAN DRAGOESCU LAVINIA MOCANU RAZVAN-DANIEL TURCU GEORGE-IONUT Gender F F F F M M F F M M M M F F M F M M F M M Baccalaureate Mark 6,50 6,50 6,50 6,80 7,00 7,00 7,50 7,50 7,80 7,80 7,90 7,90 8,00 8,00 8,00 8,30 8,40 8,40 8,70 8,70 8,70 Ydependen t variable Admision Mark 8,98 8,90 8,75 8,75 8,80 8,80 8,85 8,85 8,93 8,93 8,95 8,95 8,90 8,90 8,90 8,98 9,00 9,00 9,00 9,00 9,00
AG BC AG OT DJ PH OT PH BC BZ AG CT GJ MH AG MM BZ DJ SV GJ NT DJ BZ VN BZ GJ SV IF DB
Source:
VASILE ROXANA-MARIA TIRON RADU-MARIAN MINCA IOANA-CATALINA CIOBANU MARIANA BUTOI FLORIAN-COSMIN BEZNEA MIHAI-FLORIN ANDOR VLAD-CRISTIAN CONSTANTIN MADALIN SPANU CONSTANTIN RACOREANU COSMIN-LAURENTIU ANGHELOIU LOREDANA-ELENA PETCULESCU ELENA-ADELINA DRAGOTA LARISA-PETRUTA POPA IOANA PATRU ANDREEA-GEORGIANA DRAGOMIR RADU-STEFAN DIMOFTE CRISTIAN-DANIEL VOICULESCU ROBERT-CRISTIAN VAMANU IONELA-DANIELA OGORANU IONUT-ADRIAN PETRESCU TEDY-FLORIN RADUT ROXANA-FLORENTINA BESNEA CATALIN-GEORGE TANASE CATALIN-TITEL MATEI RAZVAN-COSMIN GRECU FLAVIUS-ANDREI IVANUTA LAURA-VASILICA SIMA MARIAN-IONUT ATANASIE ALIN-IONUT
F M F F M M M M M M F F F F F M M M F M M F M F M M F M M
8,80 8,90 8,90 8,90 8,90 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,10 9,10 9,20 9,20 9,30 9,70 9,80 9,80 9,80 10,00 10,00 10,00 10,00
9,03 9,05 9,05 8,98 8,98 9,00 9,00 9,00 9,00 9,00 9,00 9,00 8,93 8,93 8,93 8,93 8,95 8,95 8,98 8,90 8,93 9,03 9,05 9,05 9,05 9,03 9,03 9,03 9,03
http://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdf
I For each of the two variables: a) Calculate and interpret the average, standard deviation and the coefficient of variation for row data. Interpret the results. Is the data series homogenous? a.1) Average (or mean) is the arithmetic average of the scores (Baccalaureate Mark and Admission Mark), also the average is a measure of central tendency (a parameter enabling the researcher to determine the average score of a group of scores). In order to give an answer to this question and to offer an interpretation I have to understand the relationship between the average, the median and the mode and then to interpret the skewness parameter for the both cases (X and Y). Firstly, by choosing the Descriptive Statistics tool from the Data Analysis toolpack provided by MS EXCEL, I will have an overview picture of the measures of central tendency, dispersion of data, skew of data and the kurtosis of data. Descriptive statistics will help us to examine: 1. central tendency (location) of data, i.e. where data tend to fall, as measured by the mean, median, and mode. 2. dispersion (variability) of data, i.e. how spread out data are, as measured by the variance and its square root, the standard deviation. 3. skew (symmetry) of data, i.e. how concentrated data are at the low or high end of the scale, as measured by the skew index. 4. kurtosis (peakedness) of data, i.e. how concentrated data are around a single value, as measured by the kurtosis index.
Baccalaureate Mark Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness 8,6060 0,1352 8,9000 9,0000 0,9561 0,9140 -0,1215 -0,6893 Admision Mark Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness 8,9570 0,0106 8,9750 9,0000 0,0751 0,0056 0,9812 -1,1143
Range Minimum Maximum Sum Count Largest(1) Smallest(1) Confidence Level(95,0%)
3,5000 6,5000 10,0000 430,3000 50,0000 10,0000 6,5000 0,2717
Range Minimum Maximum Sum Count Largest(1) Smallest(1) Confidence Level(95,0%)
0,3000 8,7500 9,0500 447,8500 50,0000 9,0500 8,7500 0,0213
So, I computed the average of X and Y as follows:

X Y
AVERAGE AVERAGE X Y
8,61
8,96
Comparing the averages computed (X and Y) and those provided by the Descriptive Statistics tool, I will find them identical. In both cases the average () is different from their median and mode, meaning that the arrays (X and Y ) arent normal distributed data. Comparing the average with the median and mode, I can see that: average < median < mode in the both cases (X: 8,60 < 8,90 < 9,00) and (Y: 8,95 < 8,97 < 9,00),
so this 3 parameters show us that the distributions of data in our datas are nonnormal distributions (skewed distributions) and, in this case, I have for X and Y a non-bell-shaped distribution of scores. Looking at the Skewness parameter provided by the Descriptive Statistics tool, I see that the Skewness is negative in both cases, so both of the arrays (X and Y ) are negatively skewed or skewed left, meaning that the left tail is longer.
As the skewness in the Y array is between 1 and ( -0,6893), the distribution is moderately skewed. As the skewness in the X array is less than 1 (-1,1143), the distribution is highly skewed. If the data is very skewed, then the arithmetic mean might become misleading, so we can conclude that in the Y data the average is not a good parameter to measure the central tendency, while in the X data, the average might be taken into consideration when talking about the central tendency. a.2) Standard deviation is the square root of variance providing an index of variability in the distribution of scores, also the standard deviation is a measure of variability (a parameter enabling the researcher to indicate how spread out a group of scores are). Computation in excel:
Total No. of Standard observations Deviation =N X 50 0,9561 (X X) 4,4352 4,4352 4,4352 3,2616 2,5792 2,5792 1,2232 1,2232 0,6496 0,6496 0,4984 0,4984 0,3672 0,3672 0,3672 0,0936 = Standard variance standard Deviation of X dev of X Y 0,9140 0,9561 0,0751 (Y Y) 0,0003 0,0032 0,0428 0,0428 0,0246 0,0246 0,0114 0,0114 0,0010 0,0010 0,0000 0,0000 0,0032 0,0032 0,0032 0,0003 variance of Y 0,0056 = standard dev of Y 0,0751
0,0424 0,0424 0,0088 0,0088 0,0088 0,0376 0,0864 0,0864 0,0864 0,0864 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,2440 0,2440 0,3528 0,3528 0,4816 1,1968 1,4256 1,4256 1,4256 1,9432 1,9432 1,9432 1,9432
0,0018 0,0018 0,0018 0,0018 0,0018 0,0046 0,0086 0,0086 0,0003 0,0003 0,0018 0,0018 0,0018 0,0018 0,0018 0,0018 0,0018 0,0010 0,0010 0,0010 0,0010 0,0000 0,0000 0,0003 0,0032 0,0010 0,0046 0,0086 0,0086 0,0086 0,0046 0,0046 0,0046 0,0046
Comparing standard deviation computed ( X and Y) and those provided by the Descriptive Statistics tool, I will find them identical. I have X= 0,9561 and Y= 0,0751. The values computed are not so relevant measure of dispersion in their relationship with the average because in both cases (X and Y) I have non-bellshaped distributions, as I have negatively skewed or skewed left distributions. Even though, by comparing the two values of the standard deviation, I can see that X is bigger than Y even if the Y average is bigger than the X average. With this information I conclude by saying that the X array has a great variety of variables while the Y array has every variable proximate to the Y average. a.3) Coefficient of variation: measures relative dispersion. I have chosen to express the coefficient of variation in percentage and the values computed are:
Coefficient of variation
CVx= x / X
CVy= y / Y
11%
1%
The coefficient of variation values certify what I concluded in the interpretation of the standard deviation values. Once again, I can say that Y has a lower relative variability than X. a.4) Is the data series homogenous? Homogeneity measures the differences or similarities between the several variables.
Having a 11% coefficient of variation in the first array, I can say that homogeneity is low while the 1% coefficient of variation show us that the second array has a high homogeneity.
b) Summarize the data in an appropriate number of classes. Construct the frequency distribution. b.1) To solve this point, I have to identify the lowest and highest values in the list for X and Y, so I compute the min and max:
Min X Max X Min Y Max Y
6,50
10,00
8,75
9,05
Secondly, I have to compute the Range (Maximum Value Minimum Value) for each variable (X, Y):
Range X Range Y
3,50
0,30
Thirdly, I compute the number of classes by using the grouping rule:
> N.
In our case, I have N (no. of observations) =50 and Ill have k=6 classes. For identifying the exact classes I have to divide each ranges by k (Lx= 0, 5 and Ly: 0, 05) to establish the length of the interval: For X:
LL UL
For Y:
6,500 7,000
7,000 7,500
7,500 8,000 8,500 9,500
8,000 8,500 9,000 10,000
LL
UL
8,750 8,800 8,850 8,900 8,950 9,000
8,800 8,850 8,900 8,950 9,000 9,050
b.2) Frequency distribution for X:

LL UL Midpoint marks = mxi No students= xfi
Frequency distribution for Y:

LL UL Midpoint marks = myi No students= yfi
6,500
7,000
6,750
8,750
8,800
8,775
7,000 7,500 8,000 8,500
7,500 8,000 8,500 9,000
7,250 7,750 8,250 8,750 9,750 Total
2 7 3 19 13 50
8,800 8,850 8,900 8,950 9,000
8,850 8,900 8,950 9,000 9,050
8,825 8,875 8,925 8,975 9,025 total
1 6 9 19 11 50
9,500 10,000
c) Calculate and interpret for the frequency distribution the average, standard deviation and coefficient of variance. Compare with the results from point a). Explain the differences. c.1) Average of the frequency distribution ():
LL
UL
Midpoint marks =
No students= xfi
mxi*xfi
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS mxi
6,500 7,000 7,500 8,000 8,500 9,500
7,000 7,500 8,000 8,500 9,000 10,000
6,750 7,250 7,750 8,250 8,750 9,750 Total
6 2 7 3 19 13 50
40,500 14,500 54,250 24,750 166,250 126,750 427,000
8,540
LL
UL
Midpoint marks = myi
No students= yfi
myi*yfi
8,750 8,800 8,850 8,900 8,950 9,000
8,800 8,850 8,900 8,950 9,000 9,050
8,775 8,825 8,875 8,925 8,975 9,025 total
4 1 6 9 19 11 50
35,100 8,825 53,250 80,325 170,525 99,275 447,300
8,946
c.2) Standard deviation of the frequency distribution ():

LL UL Midpoint marks = mxi No students= xfi mxi*xfi X xfi(mxiX)
Standard Deviation X 0,970
6,500 7,000 7,500 8,000 8,500 9,500
7,000 7,500 8,000 8,500 9,000 10,000
6,750 7,250 7,750 8,250 8,750 9,750 Total
6 2 7 3 19 13 50
40,500 14,500 54,250 24,750 166,250 126,750 427,000
8,540
19,225 3,328 4,369 0,252 0,838 19,033 47,045
10
LL
UL
No students= yfi
myi*yfi
yfi(myiY)
Standard Deviation Y 0,071
8,750 8,800 8,850 8,900 8,950 9,000
8,800 8,850 8,900 8,950 9,000 9,050
8,775 8,825 8,875 8,925 8,975 9,025 total
4 1 6 9 19 11 50
35,100 8,825 53,250 80,325 170,525 99,275 447,300
8,946
0,117 0,015 0,030 0,004 0,016 0,069 0,250
c.3) Coefficient of variance of the frequency distribution (CV):

LL UL Midpoint marks = mxi No students= xfi mxi*xfi X xfi(mxiX)
Standard Deviation X 0,970
CVx= x / X 11%
6,500 7,000 7,500 8,000 8,500 9,500
7,000 7,500 8,000 8,500 9,000 10,000
6,750 7,250 7,750 8,250 8,750 9,750 Total
6 2 7 3 19 13 50
40,500 14,500 54,250 24,750 166,250 126,750 427,000
8,540
19,225 3,328 4,369 0,252 0,838 19,033 47,045
LL
UL
No students= yfi
myi*yfi
yfi(myiY)
Standard Deviation Y 0,071
CVy= y / Y 1%
8,750 8,800 8,850 8,900 8,950 9,000
8,800 8,850 8,900 8,950 9,000 9,050
8,775 8,825 8,875 8,925 8,975 9,025
4 1 6 9 19 11
35,100 8,825 53,250 80,325 170,525 99,275
8,946
0,117 0,015 0,030 0,004 0,016 0,069
11
total
50
447,300
0,250
c.4) Compare with the results from point a). Explain the differences. Values computed for the row data:
Average_X 8,61 Average_Y 8,96 Standard_deviation_X 0,9561 Standard_deviation_Y: 0,0751 Coefficient_of_variation_X 11% Coefficient_of_variation_Y 1%
Values computed for the grouped data:

Average_X 8,64 Average_Y 8,94 Standard_deviation_X 0,970 Standard_deviation_Y: 0,071 Coefficient_of_variation_X 11% Coefficient_of_variation_Y 1%
There are small differences between the average and the standard deviation parameters computed for the row data and those computed for the grouped data and the values resulted in the second place are more accurate because I interpret them by grouping the arrays and narrow the errors that might occur. d) Construct a histogram and describe the shape of the distribution based on the histogram. d.1) Construct a histogram: Firstly, Ive created the intervals (bins) by using the CONCATENATE function, to merge the low limit with the upper limit and then Ive copied the frequencies as follows (I have chosen to do this way because its more elegant than using the Histogram Tool from the Data Analysis Tool pack):
HYSTOGRAM X
HYSTOGRAM Y
Intervals 6,5-7 7-7,5 7,5-8
Xfi 6 2 7
Intervals 8,75 - 8,8 8,8 - 8,85 8,85 - 8,9
Yfi 4 1 6
12
8-8,5 8,5-9 9,5-10
3 19 13
8,9 - 8,95 8,95 - 9 9 - 9,05
9 19 11
The next step was to copy and paste special values in the sheet called HISTOGRAM and to select the frequencies, click Insert - Column Chart, delete the Series legend, right click on the edge of the graph and choose Select data, and enter the Intervals for the Horizontal-Axis. Then I modified some Layout and Design elements, I added a trendline, and now looks like this:
20 Frequency distribution of the baccalaureate marks 18
F r e q u e n c y
16 14 12 10 8 6 4 2 0 6,5-7 7-7,5 7,5-8 8-8,5 8,5-9 9,5-10
Baccalaureate Marks grouped in classes
13
20 Frequency distribution of the admision marks 18
F r e q u e n c y
16 14 12 10 8 6 4 2 0 8,75 - 8,8 8,8 - 8,85 8,85 - 8,9 8,9 - 8,95 8,95 - 9 9 - 9,05
Admision Marks grouped in classes
d.2) Describe the shape of the distribution based on the histogram. Our both histograms are one right heaped with a longer tail on the left this is way this histograms are negatively skewed (and moderately skewed).
e) In which interval is expected that about 95% of the data will fall? Is this assumption true for this data?
X Y
LL
UP
freq
Percentage of data
LL
UP
freq
Percentage of data
7,65
9,56
42,00
84%
8,88
9,03
45
90%
6,69 5,74
10,52 11,47
8,00 0,00
16% 0%
8,81
9,10718356
5 0
10% 0%
8,73172466 9,18227534
14
total obs
50,00
total obs
50
II Using the Pivot Table Wizard in EXCEL, build a pivot table on your spreadsheet (using also the second variable). You may have to change the order of the rows (You should define the intervals first using VLookup function). I start by creating a new work sheet called VLookUp which contains the following columns:
Unitatea Baccalaureate Admision Admision Gender Baccalaureate selectoare Categories Mark Categories Mark
The next step will be to create 2 table arrays with the intervals and categories for each mark (baccalaureate and admission):
table array LL
6,5 ,25 10
table array Baccalaureate

lucky normal smart
LL
8,75 8,98 9,05
Admision
extreme low normal extreme high
Using the VLookUp formula I will fill the Categories columns with the values presented .above:
Unitatea Gender Baccalaureate Baccalaureate Admision Admision selectoare Mark Categories Mark Categories lucky extreme low AG 6,50 8,75 F lucky extreme low AG 6,50 8,90 F lucky extreme low AG 6,50 8,98 F lucky extreme low AR 6,80 8,75 F lucky extreme low AG 7,00 8,80 M
15
AG DJ AG AG CS BR AG SV BT AG BT BZ AG GJ PH VL AG BC AG OT DJ PH OT PH BC BZ AG CT GJ MH AG MM BZ DJ SV GJ NT DJ BZ VN BZ GJ SV IF DB
M F F M M M M F F M F M M F M M F F M M F F F F M M M M M M F F M M M F M F M F M M M F M
7,00 7,50 7,50 7,80 7,80 7,90 7,90 8,00 8,00 8,00 8,30 8,40 8,40 8,70 8,70 8,70 8,80 8,90 8,90 8,90 8,90 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,10 9,10 9,20 9,20 9,30 9,70 9,80 9,80 9,80 10,00 10,00 10,00 10,00
lucky lucky lucky lucky lucky lucky lucky lucky lucky lucky normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal smart smart smart smart
8,80 8,85 8,85 8,93 8,93 8,95 8,95 8,90 8,90 8,90 8,98 9,00 9,00 9,00 9,00 9,00 9,03 8,98 8,98 9,05 9,05 8,93 8,93 8,93 8,93 9,00 9,00 9,00 9,00 9,00 9,00 9,00 8,95 8,95 8,90 8,98 8,93 9,03 9,05 9,05 9,05 9,03 9,03 9,03 9,03
extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low normal normal normal normal normal normal extreme low extreme low normal extreme high extreme low extreme low extreme low extreme low normal normal normal normal normal normal normal extreme low extreme low extreme low extreme low extreme low normal extreme high extreme high extreme high normal normal normal normal
Using the data above, I clicked on the Pivot Table button from the Insert Field and I have created the pivot table below:
16
Admision Categories Baccalaure ate Categories Count of Gender Unitatea selectoare AG AR BC BR BT BZ CS CT DB DJ GJ IF MH MM NT OT PH SV VL VN Grand Total
(All) (All)
Admisi on Mark 9,05 9, 0 5 9, 0 3 9, 0 3 1 9, 0 0 3 8, 9 8 8, 9 8 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 3 3 1 2 1 4 2 2 5 2 3 2 1 1 2 2 1 1 2 1 1 2 1 1 1 1 1 1 8, 9 5 1 8, 9 5 8, 9 3 8, 9 3 1 8, 9 0 1 8, 9 0 1 8, 8 5 1 8, 8 5 8, 8 0 2 8, 7 5 1 1 Gran d Total 14 1 2 1 2 5 1 1 1 4 4 1 1 1 1 2 3 3 1 1 50
The pivot tables shows us: - the most students who were enrolled this year came from ARGES county (14 students from AG); - the most students who were enrolled this year passed the admission exam with 9,00 (12 students). Notice that we can make the same observations
17
for the baccalaureate marks too if we switch Admission Mark with Baccalaureate Mark in the Column Labels section of the pivot table. - By choosing the from report filter the extreme high admission category we can find which mark was the extreme high mark at the admission exam, how many students took it and from which counties:
Admision Categories Baccalaureate Categories Count of Gender Unitatea selectoare BZ DJ VN Grand Total extreme high (All) Admision Mark 9,05 2 1 1 4 Grand Total 2 1 1 4
III Calculate the regression line and interpret and test the regression coefficients, coefficient of determination and coefficient of correlation. Interpret the results. a) Calculate the regression line The regression line is described as: y = a+ bx, can be computed like this:
430.3 =50 a+ 447.85 b 3856.865 = 447.85 a + 3747.95
a= (430.3-447.85 b)/50 3856.865=447.85(430.3-447.85 b)/50 +3747.95
b=0,059
18
a= 8,444
Intercept Baccalaureate Mark Coefficients 8,44436615 0,059567029
Bac grades vs Admision Mark

9.1 A d m i s i o n 9 M a r k s 8.9 8.8 8.7 6.5 7 7.5 8 8.5 9
y = 0,059x + 8,444 R = 0,575 r=0,758
9.5
10
BAC grades
The a parameter is 8,444 and it represents the intercept of the regression function and it does not have any economic significance. Geometrically, it is the point where the regression line intersects OX axis. The b parameter 0,059 is the slope of the regression line and it is called regression coefficient. Because it is positive we can say that the relationship between the two marks is a positive relationship. This parameter shows the fact that when the Baccalaureate makrs increase by 0,5 points, the Admision Marks increases by 0,059 points.
b) Test the regression coefficients, coefficient of determination and coefficient of correlation.
Regression Statistics
19
Multiple R R Square AdjustedR Square Standard Error Observations
0,758398211 0,575167847 0,566317177 0,049451391 50
The coefficient of correlation r=

[ [
= 0,758
R = 0,575 We can say by interpreting the coefficient of correlation (Multiple R = 0,758398211) that we have a relationship between the two variable and the fact that the coefficient of correlation is very close to 1 leads to the conclusion that between the baccalaureat marks and the admision one is a strong relationship (75%).
The coefficient of determination R square is 0,575167847, meaning that 57.51 % of the variation of the Admission Marks can be explained by the variation of the Baccalaureate Marks and rest of percentage by the variation of other factors. The Adjusted R Square value is 0,566317177, meaning that 56% of the evolution of the Admission Marks can be explained by the regression model y = 0,059x + 8,444
20

Managerial Data Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Managerial Data Analysis

Uploaded by

Copyright:

Available Formats

Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS

Record 40 60 data for two statistical variables (X and Y) at your choice:

Range Minimum Maximum Sum Count Largest(1) Smallest(1) Confidence Level(95,0%)

3,5000 6,5000 10,0000 430,3000 50,0000 10,0000 6,5000 0,2717

Range Minimum Maximum Sum Count Largest(1) Smallest(1) Confidence Level(95,0%)

0,3000 8,7500 9,0500 447,8500 50,0000 9,0500 8,7500 0,0213

So, I computed the average of X and Y as follows:

Thirdly, I compute the number of classes by using the grouping rule:

7,500 8,000 8,500 9,500

8,000 8,500 9,000 10,000

8,750 8,800 8,850 8,900 8,950 9,000

8,800 8,850 8,900 8,950 9,000 9,050

b.2) Frequency distribution for X:

Frequency distribution for Y:

7,000 7,500 8,000 8,500

7,500 8,000 8,500 9,000

7,250 7,750 8,250 8,750 9,750 Total

8,800 8,850 8,900 8,950 9,000

8,850 8,900 8,950 9,000 9,050

8,825 8,875 8,925 8,975 9,025 total

6,500 7,000 7,500 8,000 8,500 9,500

7,000 7,500 8,000 8,500 9,000 10,000

6,750 7,250 7,750 8,250 8,750 9,750 Total

40,500 14,500 54,250 24,750 166,250 126,750 427,000

Midpoint marks = myi

8,750 8,800 8,850 8,900 8,950 9,000

8,800 8,850 8,900 8,950 9,000 9,050

8,775 8,825 8,875 8,925 8,975 9,025 total

35,100 8,825 53,250 80,325 170,525 99,275 447,300

c.2) Standard deviation of the frequency distribution ():

Standard Deviation X 0,970

6,500 7,000 7,500 8,000 8,500 9,500

7,000 7,500 8,000 8,500 9,000 10,000

6,750 7,250 7,750 8,250 8,750 9,750 Total

40,500 14,500 54,250 24,750 166,250 126,750 427,000

19,225 3,328 4,369 0,252 0,838 19,033 47,045

Midpoint marks = myi

Standard Deviation Y 0,071

8,750 8,800 8,850 8,900 8,950 9,000

8,800 8,850 8,900 8,950 9,000 9,050

8,775 8,825 8,875 8,925 8,975 9,025 total

35,100 8,825 53,250 80,325 170,525 99,275 447,300

0,117 0,015 0,030 0,004 0,016 0,069 0,250

c.3) Coefficient of variance of the frequency distribution (CV):

Standard Deviation X 0,970

6,500 7,000 7,500 8,000 8,500 9,500

7,000 7,500 8,000 8,500 9,000 10,000

6,750 7,250 7,750 8,250 8,750 9,750 Total

40,500 14,500 54,250 24,750 166,250 126,750 427,000

19,225 3,328 4,369 0,252 0,838 19,033 47,045

Midpoint marks = myi

Standard Deviation Y 0,071

8,750 8,800 8,850 8,900 8,950 9,000

8,800 8,850 8,900 8,950 9,000 9,050

8,775 8,825 8,875 8,925 8,975 9,025

35,100 8,825 53,250 80,325 170,525 99,275

0,117 0,015 0,030 0,004 0,016 0,069

Values computed for the grouped data:

Intervals 6,5-7 7-7,5 7,5-8

Intervals 8,75 - 8,8 8,8 - 8,85 8,85 - 8,9

8-8,5 8,5-9 9,5-10

8,9 - 8,95 8,95 - 9 9 - 9,05