Professional Documents
Culture Documents
❖ https://www.kaggle.com/uciml/student-alcohol-
consumption#student-mat.csv
Data Variables
❖ The data set has so many different variables from
girlfriend to boyfriend, from study time to internet,
from alcohol usage to absences and so on.
1.000000
consumption (numeric: from 1 -
25%
50% 1.000000
very low to 5 - very high)
75% 2.000000
max 5.000000
Correlation between alcohol
and grades
Dalc
so important info here:
count 395.000000
1.481013
mean
look at the minimum, it is 1 which
means,
std 0.890741
min 1.000000
everyone drink once at least,
25% 1.000000
50% 1.000000
there is no one who does not
75% 2.000000
drink!
max 5.000000
Weekday Alcohol Consumption
Distribution
We can see all students
weekday alcohol
consumption, there is no
student who does not
consume alcohol weekday.
Majority of the students drink
at least once a week.
Weekday Versus Weekend
The correlation
coefficient between
weekday alcohol and final
grade is 0.054 which
indicates that there is no
correlation between them!
It is so surprising for
me!!
Let’s Be More Careful Here
Seemingly no correlation.
Get ready for surprising fact, don't get started drinking.
Since we don't have any student who does not drink , we
cannot say that there is no correlation.If we had one group
who drink and another group who does not drink, we could
get much more accurate outcomes, results and some
correlation between alcohol and grades. That is why I am
going to separate students into two groups, ones who drink
less, others who drink more.
It Is Time To Test It!!!
In order to avoid that mistake
mentioned above, we will separate
student_drink_less = the students into two groups. group
student_df.iloc[(student_df[' one are the ones who drink alcohol
once and group two are the ones
Dalc'] == 1).values] who drink 2 or more weekday.
student_drink_much = When i run t test here, I got p
student_df.iloc[(student_df[' value=0.036, which is so much less
than alpha level, so we reject null
Dalc'] >= 2).values] hypothesis.
stats.ttest_ind(student_drink_ which means that there is a
less['G3'], significant difference in mean of
students final grade compared the
student_drink_much['G3']) ones who drink alcohol once to the
ones who drink 2 or more.
# according to this data set, drinking
once looks fine :)
Mothers Education Level
Versus Final Grades
Students who get the
highest median score are
the ones whose mother is
most educated.
Mothers Education Level
Versus Final Grades
I just wanted to look into this When I run t test here, I got p
more 'correlation between final value=0.0002, which is so so much
grade and mother's education less than alpha level, we reject
level. null hypothesis.
I separated the students into two which means that there is a very
groups. significant difference in mean of
Group one are the ones who students final grade
have mother education level 2 or compared the ones who have
less mother education level 2 or less
Group two are the ones who to the ones who have mother
have mother education level 3 or education level 3 or more .
more
Further Study
Suggestions
I have identified more correlations between
students grades and other factors but they
need more attention in the further studies.
Wants Higher Level Education