Professional Documents
Culture Documents
CASE STUDY
Admision - sesion July 2011 "Alexandru Ioan Cuza" Police Academy List of admitted candidates Police - Law
Xindependent variable Selection unit (county) AG AG AG AR AG AG DJ AG AG CS BR AG SV BT AG BT BZ AG GJ PH VL NUME PRENUME GHINESCU ANDREEA-FLORENTINA MORARU ANCA-NICOLETA ALEXE SIDONIA-CRESCENZIA RADU LUIZA-CLAUDIA ARSENE STEFAN-ALEXANDRU FLORESCU IONUT-ANDREI MARCULESCU ROXANA TRACHE CAMELIA-ELENA CALUGAROIU LIVIU-MARIAN TEODORESCU SILVIU-PETRU STOILESCU FLORIAN ANITA ALEXANDRU DUMITRAS CORINA-LAVINIA ISTRATE DANIELA-ANDREEA CIOBANU DANIEL-VALENTIN GABOR ANDREEA-CATALINA OPREA ALEXANDRU-ION IORDACHESCU MIHAI-CIPRIAN DRAGOESCU LAVINIA MOCANU RAZVAN-DANIEL TURCU GEORGE-IONUT Gender F F F F M M F F M M M M F F M F M M F M M Baccalaureate Mark 6,50 6,50 6,50 6,80 7,00 7,00 7,50 7,50 7,80 7,80 7,90 7,90 8,00 8,00 8,00 8,30 8,40 8,40 8,70 8,70 8,70 Ydependen t variable Admision Mark 8,98 8,90 8,75 8,75 8,80 8,80 8,85 8,85 8,93 8,93 8,95 8,95 8,90 8,90 8,90 8,98 9,00 9,00 9,00 9,00 9,00
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
AG BC AG OT DJ PH OT PH BC BZ AG CT GJ MH AG MM BZ DJ SV GJ NT DJ BZ VN BZ GJ SV IF DB
Source:
VASILE ROXANA-MARIA TIRON RADU-MARIAN MINCA IOANA-CATALINA CIOBANU MARIANA BUTOI FLORIAN-COSMIN BEZNEA MIHAI-FLORIN ANDOR VLAD-CRISTIAN CONSTANTIN MADALIN SPANU CONSTANTIN RACOREANU COSMIN-LAURENTIU ANGHELOIU LOREDANA-ELENA PETCULESCU ELENA-ADELINA DRAGOTA LARISA-PETRUTA POPA IOANA PATRU ANDREEA-GEORGIANA DRAGOMIR RADU-STEFAN DIMOFTE CRISTIAN-DANIEL VOICULESCU ROBERT-CRISTIAN VAMANU IONELA-DANIELA OGORANU IONUT-ADRIAN PETRESCU TEDY-FLORIN RADUT ROXANA-FLORENTINA BESNEA CATALIN-GEORGE TANASE CATALIN-TITEL MATEI RAZVAN-COSMIN GRECU FLAVIUS-ANDREI IVANUTA LAURA-VASILICA SIMA MARIAN-IONUT ATANASIE ALIN-IONUT
F M F F M M M M M M F F F F F M M M F M M F M F M M F M M
8,80 8,90 8,90 8,90 8,90 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,10 9,10 9,20 9,20 9,30 9,70 9,80 9,80 9,80 10,00 10,00 10,00 10,00
9,03 9,05 9,05 8,98 8,98 9,00 9,00 9,00 9,00 9,00 9,00 9,00 8,93 8,93 8,93 8,93 8,95 8,95 8,98 8,90 8,93 9,03 9,05 9,05 9,05 9,03 9,03 9,03 9,03
http://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdf
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
I For each of the two variables: a) Calculate and interpret the average, standard deviation and the coefficient of variation for row data. Interpret the results. Is the data series homogenous? a.1) Average (or mean) is the arithmetic average of the scores (Baccalaureate Mark and Admission Mark), also the average is a measure of central tendency (a parameter enabling the researcher to determine the average score of a group of scores). In order to give an answer to this question and to offer an interpretation I have to understand the relationship between the average, the median and the mode and then to interpret the skewness parameter for the both cases (X and Y). Firstly, by choosing the Descriptive Statistics tool from the Data Analysis toolpack provided by MS EXCEL, I will have an overview picture of the measures of central tendency, dispersion of data, skew of data and the kurtosis of data. Descriptive statistics will help us to examine: 1. central tendency (location) of data, i.e. where data tend to fall, as measured by the mean, median, and mode. 2. dispersion (variability) of data, i.e. how spread out data are, as measured by the variance and its square root, the standard deviation. 3. skew (symmetry) of data, i.e. how concentrated data are at the low or high end of the scale, as measured by the skew index. 4. kurtosis (peakedness) of data, i.e. how concentrated data are around a single value, as measured by the kurtosis index.
Baccalaureate Mark Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness 8,6060 0,1352 8,9000 9,0000 0,9561 0,9140 -0,1215 -0,6893 Admision Mark Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness 8,9570 0,0106 8,9750 9,0000 0,0751 0,0056 0,9812 -1,1143
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
AVERAGE AVERAGE X Y
8,61
8,96
Comparing the averages computed (X and Y) and those provided by the Descriptive Statistics tool, I will find them identical. In both cases the average () is different from their median and mode, meaning that the arrays (X and Y ) arent normal distributed data. Comparing the average with the median and mode, I can see that: average < median < mode in the both cases (X: 8,60 < 8,90 < 9,00) and (Y: 8,95 < 8,97 < 9,00),
so this 3 parameters show us that the distributions of data in our datas are nonnormal distributions (skewed distributions) and, in this case, I have for X and Y a non-bell-shaped distribution of scores. Looking at the Skewness parameter provided by the Descriptive Statistics tool, I see that the Skewness is negative in both cases, so both of the arrays (X and Y ) are negatively skewed or skewed left, meaning that the left tail is longer.
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
As the skewness in the Y array is between 1 and ( -0,6893), the distribution is moderately skewed. As the skewness in the X array is less than 1 (-1,1143), the distribution is highly skewed. If the data is very skewed, then the arithmetic mean might become misleading, so we can conclude that in the Y data the average is not a good parameter to measure the central tendency, while in the X data, the average might be taken into consideration when talking about the central tendency. a.2) Standard deviation is the square root of variance providing an index of variability in the distribution of scores, also the standard deviation is a measure of variability (a parameter enabling the researcher to indicate how spread out a group of scores are). Computation in excel:
Total No. of Standard observations Deviation =N X 50 0,9561 (X X) 4,4352 4,4352 4,4352 3,2616 2,5792 2,5792 1,2232 1,2232 0,6496 0,6496 0,4984 0,4984 0,3672 0,3672 0,3672 0,0936 = Standard variance standard Deviation of X dev of X Y 0,9140 0,9561 0,0751 (Y Y) 0,0003 0,0032 0,0428 0,0428 0,0246 0,0246 0,0114 0,0114 0,0010 0,0010 0,0000 0,0000 0,0032 0,0032 0,0032 0,0003 variance of Y 0,0056 = standard dev of Y 0,0751
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
0,0424 0,0424 0,0088 0,0088 0,0088 0,0376 0,0864 0,0864 0,0864 0,0864 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,1552 0,2440 0,2440 0,3528 0,3528 0,4816 1,1968 1,4256 1,4256 1,4256 1,9432 1,9432 1,9432 1,9432
0,0018 0,0018 0,0018 0,0018 0,0018 0,0046 0,0086 0,0086 0,0003 0,0003 0,0018 0,0018 0,0018 0,0018 0,0018 0,0018 0,0018 0,0010 0,0010 0,0010 0,0010 0,0000 0,0000 0,0003 0,0032 0,0010 0,0046 0,0086 0,0086 0,0086 0,0046 0,0046 0,0046 0,0046
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
Comparing standard deviation computed ( X and Y) and those provided by the Descriptive Statistics tool, I will find them identical. I have X= 0,9561 and Y= 0,0751. The values computed are not so relevant measure of dispersion in their relationship with the average because in both cases (X and Y) I have non-bellshaped distributions, as I have negatively skewed or skewed left distributions. Even though, by comparing the two values of the standard deviation, I can see that X is bigger than Y even if the Y average is bigger than the X average. With this information I conclude by saying that the X array has a great variety of variables while the Y array has every variable proximate to the Y average. a.3) Coefficient of variation: measures relative dispersion. I have chosen to express the coefficient of variation in percentage and the values computed are:
Coefficient of variation
CVx= x / X
CVy= y / Y
11%
1%
The coefficient of variation values certify what I concluded in the interpretation of the standard deviation values. Once again, I can say that Y has a lower relative variability than X. a.4) Is the data series homogenous? Homogeneity measures the differences or similarities between the several variables.
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
Having a 11% coefficient of variation in the first array, I can say that homogeneity is low while the 1% coefficient of variation show us that the second array has a high homogeneity.
b) Summarize the data in an appropriate number of classes. Construct the frequency distribution. b.1) To solve this point, I have to identify the lowest and highest values in the list for X and Y, so I compute the min and max:
Min X Max X Min Y Max Y
6,50
10,00
8,75
9,05
Secondly, I have to compute the Range (Maximum Value Minimum Value) for each variable (X, Y):
Range X Range Y
3,50
0,30
> N.
In our case, I have N (no. of observations) =50 and Ill have k=6 classes. For identifying the exact classes I have to divide each ranges by k (Lx= 0, 5 and Ly: 0, 05) to establish the length of the interval: For X:
LL UL
For Y:
6,500 7,000
7,000 7,500
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
LL
UL
6,500
7,000
6,750
8,750
8,800
8,775
2 7 3 19 13 50
1 6 9 19 11 50
9,500 10,000
c) Calculate and interpret for the frequency distribution the average, standard deviation and coefficient of variance. Compare with the results from point a). Explain the differences. c.1) Average of the frequency distribution ():
LL
UL
Midpoint marks =
No students= xfi
mxi*xfi
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS mxi
6 2 7 3 19 13 50
8,540
LL
UL
No students= yfi
myi*yfi
4 1 6 9 19 11 50
8,946
6 2 7 3 19 13 50
8,540
10
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
LL
UL
No students= yfi
myi*yfi
yfi(myiY)
4 1 6 9 19 11 50
8,946
CVx= x / X 11%
6 2 7 3 19 13 50
8,540
LL
UL
No students= yfi
myi*yfi
yfi(myiY)
CVy= y / Y 1%
4 1 6 9 19 11
8,946
11
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
total
50
447,300
0,250
c.4) Compare with the results from point a). Explain the differences. Values computed for the row data:
Average_X 8,61 Average_Y 8,96 Standard_deviation_X 0,9561 Standard_deviation_Y: 0,0751 Coefficient_of_variation_X 11% Coefficient_of_variation_Y 1%
There are small differences between the average and the standard deviation parameters computed for the row data and those computed for the grouped data and the values resulted in the second place are more accurate because I interpret them by grouping the arrays and narrow the errors that might occur. d) Construct a histogram and describe the shape of the distribution based on the histogram. d.1) Construct a histogram: Firstly, Ive created the intervals (bins) by using the CONCATENATE function, to merge the low limit with the upper limit and then Ive copied the frequencies as follows (I have chosen to do this way because its more elegant than using the Histogram Tool from the Data Analysis Tool pack):
HYSTOGRAM X
HYSTOGRAM Y
Xfi 6 2 7
Yfi 4 1 6
12
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
3 19 13
9 19 11
The next step was to copy and paste special values in the sheet called HISTOGRAM and to select the frequencies, click Insert - Column Chart, delete the Series legend, right click on the edge of the graph and choose Select data, and enter the Intervals for the Horizontal-Axis. Then I modified some Layout and Design elements, I added a trendline, and now looks like this:
20 Frequency distribution of the baccalaureate marks 18
F r e q u e n c y
13
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
F r e q u e n c y
16 14 12 10 8 6 4 2 0 8,75 - 8,8 8,8 - 8,85 8,85 - 8,9 8,9 - 8,95 8,95 - 9 9 - 9,05
d.2) Describe the shape of the distribution based on the histogram. Our both histograms are one right heaped with a longer tail on the left this is way this histograms are negatively skewed (and moderately skewed).
e) In which interval is expected that about 95% of the data will fall? Is this assumption true for this data?
X Y
LL
UP
freq
Percentage of data
LL
UP
freq
Percentage of data
7,65
9,56
42,00
84%
8,88
9,03
45
90%
6,69 5,74
10,52 11,47
8,00 0,00
16% 0%
8,81
9,10718356
5 0
10% 0%
8,73172466 9,18227534
14
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
total obs
50,00
total obs
50
II Using the Pivot Table Wizard in EXCEL, build a pivot table on your spreadsheet (using also the second variable). You may have to change the order of the rows (You should define the intervals first using VLookup function). I start by creating a new work sheet called VLookUp which contains the following columns:
Unitatea Baccalaureate Admision Admision Gender Baccalaureate selectoare Categories Mark Categories Mark
The next step will be to create 2 table arrays with the intervals and categories for each mark (baccalaureate and admission):
table array LL
6,5 ,25 10
LL
8,75 8,98 9,05
Admision
extreme low normal extreme high
Using the VLookUp formula I will fill the Categories columns with the values presented .above:
Unitatea Gender Baccalaureate Baccalaureate Admision Admision selectoare Mark Categories Mark Categories lucky extreme low AG 6,50 8,75 F lucky extreme low AG 6,50 8,90 F lucky extreme low AG 6,50 8,98 F lucky extreme low AR 6,80 8,75 F lucky extreme low AG 7,00 8,80 M
15
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
AG DJ AG AG CS BR AG SV BT AG BT BZ AG GJ PH VL AG BC AG OT DJ PH OT PH BC BZ AG CT GJ MH AG MM BZ DJ SV GJ NT DJ BZ VN BZ GJ SV IF DB
M F F M M M M F F M F M M F M M F F M M F F F F M M M M M M F F M M M F M F M F M M M F M
7,00 7,50 7,50 7,80 7,80 7,90 7,90 8,00 8,00 8,00 8,30 8,40 8,40 8,70 8,70 8,70 8,80 8,90 8,90 8,90 8,90 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,00 9,10 9,10 9,20 9,20 9,30 9,70 9,80 9,80 9,80 10,00 10,00 10,00 10,00
lucky lucky lucky lucky lucky lucky lucky lucky lucky lucky normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal normal smart smart smart smart
8,80 8,85 8,85 8,93 8,93 8,95 8,95 8,90 8,90 8,90 8,98 9,00 9,00 9,00 9,00 9,00 9,03 8,98 8,98 9,05 9,05 8,93 8,93 8,93 8,93 9,00 9,00 9,00 9,00 9,00 9,00 9,00 8,95 8,95 8,90 8,98 8,93 9,03 9,05 9,05 9,05 9,03 9,03 9,03 9,03
extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low extreme low normal normal normal normal normal normal extreme low extreme low normal extreme high extreme low extreme low extreme low extreme low normal normal normal normal normal normal normal extreme low extreme low extreme low extreme low extreme low normal extreme high extreme high extreme high normal normal normal normal
Using the data above, I clicked on the Pivot Table button from the Insert Field and I have created the pivot table below:
16
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
Admision Categories Baccalaure ate Categories Count of Gender Unitatea selectoare AG AR BC BR BT BZ CS CT DB DJ GJ IF MH MM NT OT PH SV VL VN Grand Total
(All) (All)
The pivot tables shows us: - the most students who were enrolled this year came from ARGES county (14 students from AG); - the most students who were enrolled this year passed the admission exam with 9,00 (12 students). Notice that we can make the same observations
17
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
for the baccalaureate marks too if we switch Admission Mark with Baccalaureate Mark in the Column Labels section of the pivot table. - By choosing the from report filter the extreme high admission category we can find which mark was the extreme high mark at the admission exam, how many students took it and from which counties:
Admision Categories Baccalaureate Categories Count of Gender Unitatea selectoare BZ DJ VN Grand Total extreme high (All) Admision Mark 9,05 2 1 1 4 Grand Total 2 1 1 4
III Calculate the regression line and interpret and test the regression coefficients, coefficient of determination and coefficient of correlation. Interpret the results. a) Calculate the regression line The regression line is described as: y = a+ bx, can be computed like this:
b=0,059
18
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
a= 8,444
Intercept Baccalaureate Mark Coefficients 8,44436615 0,059567029
9.5
10
BAC grades
The a parameter is 8,444 and it represents the intercept of the regression function and it does not have any economic significance. Geometrically, it is the point where the regression line intersects OX axis. The b parameter 0,059 is the slope of the regression line and it is called regression coefficient. Because it is positive we can say that the relationship between the two marks is a positive relationship. This parameter shows the fact that when the Baccalaureate makrs increase by 0,5 points, the Admision Marks increases by 0,059 points.
Regression Statistics
19
Student: FOCSA CRISTINA Master program: International Project Management Course: MANAGERIAL DATA ANALYSIS
= 0,758
R = 0,575 We can say by interpreting the coefficient of correlation (Multiple R = 0,758398211) that we have a relationship between the two variable and the fact that the coefficient of correlation is very close to 1 leads to the conclusion that between the baccalaureat marks and the admision one is a strong relationship (75%).
The coefficient of determination R square is 0,575167847, meaning that 57.51 % of the variation of the Admission Marks can be explained by the variation of the Baccalaureate Marks and rest of percentage by the variation of other factors. The Adjusted R Square value is 0,566317177, meaning that 56% of the evolution of the Admission Marks can be explained by the regression model y = 0,059x + 8,444
20