You are on page 1of 24

Business Mathematics and

Statistics
Hypothesis and Testing of Hypothesis
Regression and Correlation Analysis
Class 10
25th March 2019
Hypothesis
• A hypothesis is a tentative, testable answer to a scientific question.
• A hypothesis is a proposed explanation for a phenomenon.
• A supposition or explanation (theory) that is provisionally accepted in order to
interpret certain events or phenomena, and to provide guidance for further
investigation.
• A hypothesis is a suggested solution for an unexplained occurrence that does
not fit into current accepted scientific theory.
• The basic idea of a hypothesis is that there is no pre-determined outcome.
• A hypothesis may be proven correct or wrong, and must be capable of
refutation.
• If it remains unrefuted by facts, it is said to be verified or corroborated.
• For a hypothesis to be termed a scientific hypothesis, it has to be something that
can be supported or refuted through carefully crafted experimentation or
observation.
• This is called falsifiability and testability
Example of Hypothesis
• Hypothesis is developed to answer a research question of particular interest.
• A hypothesis has classical been referred to as an educated guess.
• Sales of Ice cream increase in summer
• Female borrowers are good in repayment of loans
• University degree increases the chance of getting higher salary
• Distraction in class room decreases the test results
• If I add fertilizer to my. garden, then my plants will grow faster.
Types of Hypothesis
• A null hypothesis (H0) exists when a researcher believes there is no relationship
between the two variables, or there is a lack of information to state a scientific
hypothesis. This is something to attempt to disprove or discredit.
• There is no significant change in my health during the times when I drink green tea only or root
beer only.
• This is where the alternative hypothesis (H1) enters the scene. In an attempt to
disprove a null hypothesis, researchers will seek to discover an alternative hypothesis.
• My health improves during the times when I drink green tea only, as opposed to root beer only.
• A simple hypothesis is a prediction of the relationship between two variables: the
independent variable and the dependent variable.
• Drinking sugary drinks daily leads to obesity.
• A complex hypothesis examines the relationship between two or more independent
variables and two or more dependent variables.
• Overweight adults who 1) value longevity and 2) seek happiness are more likely than other
adults to 1) lose their excess weight and 2) feel a more regular sense of joy.
Cont…
• logical hypothesis
• empirical hypothesis
• statistical hypothesis
Characteristics of Good Hypothesis…
Characteristics of Good Hypothesis
• hypothesis should be based on information from reference materials about the
topic.
• At least one clear prediction could be made from the hypothesis.
• Predictions resulting from the hypothesis must be testable in an experiment.
• Prediction must have both an independent variable (something you change) and
a dependent variable (something you observe or measure).
Testing of Hypothesis
• The primary trait of a hypothesis is that something can be tested and that those
tests can be replicated.
• Hypothesis testing in statistics is a way to test the results of a survey or
experiment to see if it have meaningful results.
• You’re basically testing whether your results are valid by figuring out the odds
that your results have happened by chance.
• If your results may have happened by chance, the experiment won’t be
repeatable and so has little use.
• During a test, the scientist may try to prove or disprove just the null hypothesis
or test both the null and the alternative hypothesis.
• As sufficient data and evidence are gathered to support a hypothesis, it becomes
a working hypothesis, which is a milestone on the way to becoming a theory.
Cont…
• A researcher thinks that if knee surgery patients go to physical therapy twice a
week (instead of 3 times), their recovery period will be longer. Average
recovery times for knee surgery patients is 8.2 weeks.
• The hypothesis statement in this question is that the researcher believes the average
recovery time is more than 8.2 weeks. It can be written in mathematical terms as:
• H1: μ > 8.2
• Next, there is a need to state the null hypothesis. (That’s what will happen if the
researcher is wrong.) In the above example, if the researcher is wrong then the recovery
time is less than or equal to 8.2 weeks. In math, that’s:
• H0: μ ≤ 8.2
• P Values
• A P value of 0.05 (5%) or less is usually enough to claim that your results are
repeatable.
• If the P value of our result is less then or equal to 0.05 (5%), then alternative hypothesis
can be accepted.
• Technically it is stated as, “We fail to accept the null hypothesis”.
Regression and Correlation Analysis
• correlation analysis, is used to quantify the association between two
continuous variables (e.g., between an independent and a dependent variable
or between two independent variables).
• Regression analysis is a related technique to assess the relationship between an
outcome variable and one or more risk factors or confounding variables.
• The outcome variable is also called the predictor or response or dependent
variable and the risk factors and confounders are called the predictors, or
explanatory or independent variables.
• In regression analysis, the dependent variable is denoted "y" and the
independent variables are denoted by "x".
Correlation Analysis
• In correlation analysis, we estimate a sample correlation coefficient, more specifically
the Pearson Product Moment correlation coefficient. The sample correlation
coefficient, denoted r
• (the other correlation type is Spearman's correlation coefficient)
• ranges between -1 and +1 and quantifies the direction and strength of the linear
association between the two variables.
• The correlation between two variables can be positive (i.e., higher levels of one
variable are associated with higher levels of the other) or negative (i.e., higher levels
of one variable are associated with lower levels of the other).
• The sign of the correlation coefficient indicates the direction of the association.
• The magnitude of the correlation coefficient indicates the strength of the association.
• For example, a correlation of r = 0.9 suggests a strong, positive association between
two variables, whereas a correlation of r = -0.2 suggest a weak, negative association.
• A correlation close to zero suggests no linear association between two continuous
variables.
Problem:
A small study is conducted involving 17 summer days to investigate the
association between temperature, measured in centigrade, and Ice
Cream Sale, measured in cups.
We wish to estimate the
association between
temperature and ice cream
sales. In this example, ice
cream sale is the dependent
variable and temperature is
the independent variable.
Thus y= ice cream sales
and x= temperature.
• Note that the independent variable is on the horizontal axis (or X-axis), and the
dependent variable is on the vertical axis (or Y-axis).
Each point represents a (x,y) pair
The scatter plot shows a positive or direct association between temperature and ice
cream sales.
Rise of temperature will lead to a higher sales, and low temperature are more likely to
reduce the ice cream sales
Cont..
• The formula for the sample correlation coefficient is;

• Cov(x,y) is the covariance of x and y, formula for covariance is


• S2x and S2y are the variance of x and y and the formula is

• The variances of x and y measure the variability of the x scores and y scores
around their respective sample means (x̅ and ȳ considered separately).
• The covariance measures the variability of the (x,y) pairs around the mean of x
and mean of y, considered simultaneously.
• To compute the sample correlation coefficient, we need to compute the variance
of temperature, the variance of ice cream sales and also the covariance of
temperature and ice cream sales.
Cont..
• Mean of X = 38.4
• Variance of X = 10
• Mean of Y = 2902
• Variance of Y = 485578.8
• The covariance of temperature and sales = 1798.025
• Correlation coefficient is 1798.025/2199.4 = 0.82

• the sample correlation coefficient indicates a strong positive correlation.


Significance test
• To test whether the association is merely apparent, and might have arisen by
chance use the t test in the following calculation:

• t= 0.82*sqrt(17-2/1-0.822) = 5.55
• In t table at 17 - 2 = 15 degrees of freedom we find that at t = 5.55, P<0.001 so
the correlation coefficient may be regarded as highly significant.
• Thus (as could be seen immediately from the scatter plot) we have a very
strong correlation between temperature and ice cream sales which is most
unlikely to have arisen by chance.
Some Problems
1. A researcher carefully computes the correlation coefficient between two
variables and gets r = 1.23. What does this value mean?
2. A researcher wishes to test the idea that show size and mathematical ability
are correlated; that is, people with larger feet have higher mathematical
skills. To test this he conducts a study of an entire town of 2000 persons
measuring their shoe size and administering a math test.
1. He finds that there is a significant correlation between shoe size and math skills with
people with larger feet having higher math skills.
2. What might an important problem with this approach?
3. In the previous problem the researcher decides to use data only for adults
ages 21 to 60 to compute a correlation coefficient. What value of r should
he expect?
Regression Analysis
• Regression analysis is a set of statistical processes for estimating the
relationships among variables.
• It includes many techniques for modeling and analyzing several variables, when
the focus is on the relationship between a dependent variable and one or more
independent variables (or 'predictors').
• More specifically, regression analysis helps one understand how the typical
value of the dependent variable (or 'criterion variable') changes when any one
of the independent variables is varied, while the other independent variables are
held fixed.
• Regression analysis is widely used for prediction and forecasting
• Regression analysis is also used to understand which among the independent
variables are related to the dependent variable, and to explore the forms of these
relationships.
• In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables.
Cont..
• Many techniques for carrying out regression analysis have been developed.
• linear regression and ordinary least squares regression are most common and
familiar techniques.
Equation and Formula
• The formula for regression is as follows,
• Regression Equation y = a + bx
• Slope (b) = N∑XY−(∑X)(∑Y) / N∑X2−(∑X)2
• Intercept (a) = ∑Y−b(∑X)/ N
Where,
• x and y are the variables.
• b = the slope of the regression line is also called as regression coefficient
• a = intercept point of the regression line which is in the y-axis.
• N = Number of values or elements
• X = First Score
• Y = Second Score
• ∑XY = Sum of the product of the first and Second Scores
• ∑X = Sum of First Scores
• ∑Y = Sum of Second Scores
• ∑X2 = Sum of square First Scores.
Regression Problem
• Determine the regression equation by using the regression slope coefficient
and intercept value as shown in the regression table given below.
X Values Y Values
55 52
60 54
65 56
70 58
80 62

• For the given data set of data, solve the regression slope and intercept values.
Correlation and Regression Problem
X Y
72 45
73 38 X Y
75 41 1 16
76 35 2 23
77 31 4 35
78 40
3 28
79 25
5 44
80 32
80 36 6 40
81 29 3 22
82 34 8 61
83 38 9 82
84 26
85 32
86 28
88 27

a) Sketch a scatterplot. a) Sketch a scatterplot.


b) Compute the correlation coefficient, r. b) Compute the correlation coefficient, r.
c) Compute the coefficients of the linear c) Compute the coefficients of the linear
regression line, y = b1x + b0. regression line, y = b1x + b0.
2) What is the estimated value for X = 80? 2) What is the estimated value for X = 7?

You might also like