You are on page 1of 25

Presentation of

BUSINESS RESEARCH METHODOLOGY on


ANALYSIS OF RELATIONSHIPS AND STATISTICAL INFERENCES FOR ONE OR TWO SAMPLES

Presented by: Ruchi Jain Shivani Shivhare Niharika Agrawal MBA II SEM , Sec:B

Analysis Of Relationship
In case of bivariate or multivariate population i.e., if the data happen to be two variables and more than two variables respectively, the measures of relationship are used. For determining the relationship two variables and more than two variables ,there are two techniques:: 1.Correlation Technique 2.Regression Technique 1.CORRELATION:

Correlation can be defined as a quantitative measure of the degree or strength of relationship that may exist between two variables.

The goal of a correlation analysis is to see whether two measurement variables co vary, and to quantify the strength of the linear relationship between the variables. To analyze the relationship between variables, the prominent correlation coefficients are: In case of bivariate population: 1. Charles Spearmans Rank correlation coefficient. 2. Karl Pearson's Rank correlation coefficient. In case of multivariate population: 1.Co-efficient of Multiple correlation. 2. Co-efficient of Partial correlation.

WHY USE CORRELATION We can use the correlation coefficient to test if there is a linear relationship between the variables. To quantify the strength of the relationship, we can calculate the correlation coefficient (r). Its numerical value ranges from +1.0 to 1.0. r > 0 indicates positive linear relationship, r < 0 indicates negative linear relationship while r = 0 indicates no linear relationship. 2.REGRESSION:

Regression expresses the relationship in the form of an equation. Regression analysis, in general sense, means the estimation or prediction of the unknown value of one variable from the known value of the other variable.

WHY USE REGRESSION


In regression analysis, the problem of interest is the nature of the relationship itself between the dependent variable (response) and the (explanatory) independent variable The analysis consists of choosing and fitting an appropriate model, done by the method of least squares, with a view to exploiting the relationship between the variables to help estimate the expected response for a given value of the independent variable. The following regression techniques are used to study the cause and effect relationship: In case of bivariate population:  Simple Regression equation In case of multivariate population:  Least square method

Hypothesis Testing of Correlation Coefficients


1.Karl Pearson's Rank correlation coefficient. It measures the strength of linear relationship between two variable. the coefficient of correlation r which measures the degree of association between the two values of related variables given in the data set. It takes values from + 1 to 1. If two sets or data have r = +1, they are said to be perfectly correlated positively if r = -1 they are said to be perfectly correlated negatively; and if r = 0 they are uncorrelated.

To test whether or not r is significantly different from zero, we use the t test for Pearsons r: n2 1 r2 Since this is like any other hypothesis test, we want to compare tob with tcrit. For this test, we use our given alpha level (conventionally,.05 or .01) and df = n 2. In this case, we subtract 2 from our sample size because we have two variables. So, with = .05, is the correlation significant

tob = r

Bivariate Regression Analysis


LINEAR REGRESSION ANALYSIS

Linear regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of another variable. More precisely, if X and Y are two related variables, then linear regression analysis helps us to predict the value of Y for a given value of X or vice verse. Suppose we have a sample of size n and it has two sets of measures,denoted by x and y. We can predict the values of y given the values of x by using the equation, called the REGRESSION EQUATION. y* = a + bx where the coefficients a and b are alpha and beta two constant which are known as regression coefficients.

Least Square Method of Regression Analysis When the correlation between the two variables is not perfect, the data points do not fit in a straight line. In such a situation, the this method will help the researcher in finding a line that fits the set of data with minimum errors. Regression equation is: Y=a+bx+e1 Where , Y=dependent variable a=Y-intercept b=slope of line X= independent variable e=error/residual.

The constant a,b is calculated as

HOW GOOD IS THE REGRESSION?

Statistically, it is equivalent to testing the null hypothesis that the relevant regression coefficient is zero.This can be done using t-test. If the t-test of a regression coefficient is significant, it indicates that the variable is in question influences Y significantly while controlling for other independent explanatory variables.

USES OF CORRELATION AND REGRESSION

There are three main uses for correlation and regression. y One is to test hypotheses about cause-and-effect relationships. In this case, the experimenter determines the values of the X-variable and sees whether variation in X causes variation in Y. For example, giving people different amounts of a drug and measuring their blood pressure. y The second main use for correlation and regression is to see whether two variables are associated, without necessarily inferring a cause-and-effect relationship. In this case, neither variable is determined by the experimenter; both are naturally variable. If an association is found, the inference is that variation in X may cause variation in Y, or variation in Y may cause variation in X, or variation in some other factor may affect both X and Y. y The third common use of linear regression is estimating the value of one variable corresponding to a particular value of the other variable.

Statistical Inferences
Statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.
There are three (3) general areas that make up the field of statistics: descriptive statistics, relational statistics, and inferential statistics 1. Descriptive statistics fall into one of two categories: measures of central tendency (mean, median, and mode) or measures of dispersion (standard deviation and variance). Their purpose is to explore hunches that may have come up during the course of the research process, but most people compute them to look at the normality of their numbers. Examples include descriptive analysis of sex, age, race, social class, and so forth.

2. Relational statistics fall into one of three categories:


univariate, bivariate, and multivariate analysis. I. Univariate analysis is the study of one variable for a subpopulation, for example, age of murderers, and the analysis is often descriptive, but you'd be surprised how many advanced statistics can be computed using just one variable. II. Bivariate analysis is the study of a relationship between two variables, for example, murder and meanness, and the most commonly known technique here is correlation. III. Multivariate analysis is the study of relationship between three or more variables, for example, murder, meanness, and gun ownership, and for all techniques in this area, you simply take the word "multiple" and put it in front of the bivariate technique used in multiple correlation.

3. Inferential statistics, also called inductive statistics, fall into one of two categories: tests for difference of means and tests for statistical significance, the latter which are further subdivided into parametric or nonparametric, depending upon whether you're inferring to the larger population as a whole (parametric) or the people in your sample (nonparametric). y The purpose of difference of means tests is to test hypotheses, and the most common techniques are called Z-tests. y The most common parametric tests of significance are the F-test, t-test, ANOVA, and regression.
y

To summarize: y Descriptive statistics (mean, median, mode; standard deviation, variance) y Relational statistics (correlation, multiple correlation) y Inferential tests for difference of means (Z-tests) y Inferential parametric tests for significance (F-tests, t-tests, ANOVA, regression). MEASURES OF CENTRAL TENDENCY 1.MEAN The most commonly used measure of central tendency is the mean. To compute the mean, you add up all the numbers and divide by how many numbers there are. It's not the average nor a halfway point, but a kind of center that balances high numbers with low numbers. For this reason, it's most often reported along with some simple measure of dispersion, such as the range, which is expressed as the lowest and highest number.

2.MEDIAN
The median is the number that falls in the middle of a range of numbers. It's not the average; it's the halfway point. There are always just as many numbers above the median as below it. In cases where there is an even set of numbers, you average the two middle numbers. The median is best suited for data that are ordinal, or ranked. It is also useful when you have extremely low or high scores.

3. MODE
The mode is the most frequently occurring number in a list of numbers. It's the closest thing to what people mean when they say something is average or typical. The mode doesn't even have to be a number. It will be a category when the data are nominal or qualitative. The mode is useful when you have a highly skewed set of numbers, mostly low or mostly high. You can also have two modes (bimodal distribution) when one group of scores are mostly low and the other group is mostly high, with few in the middle.

MEASURES OF DISPERSION
In data analysis, the purpose of statistically computing a measure of dispersion is to discover the extent to which scores differ, cluster, or spread from around a measure of central tendency. y The most commonly used measure of dispersion is the standard deviation. You first compute the variance, which is calculated by subtracting the mean from each number, squaring it, and dividing the grand total (Sum of Squares) by how many numbers there are. The square root of the variance is the standard deviation. y The standard deviation is important for many reasons. One reason is that, once you know the standard deviation, you can standardize by it. Standardization is the process of converting raw scores into what are called standard scores, which allow you to better compare groups of different sizes.
y

Standardization isn't required for data analysis, but it becomes useful when you want to compare different subgroups in your sample, or between groups in different studies.

A standard score is called a z-score (not to be confused with a z-test), and is calculated by subtracting the mean from each and every number and dividing by the standard deviation. Once you have converted your data into standard scores, you can then use probability tables that exist for estimating the likelihood that a certain raw score will appear in the population. This is an example of using a descriptive statistic (standard deviation) for inferential purposes.

Z-TESTS, F-TESTS, AND T-TESTS


Z-TESTS  Z-tests are not to be confused with z-scores. Z-tests come in a variety of forms, the most popular being: 1) to test the significance of correlation coefficients; 2) to test for equivalence of sample proportions to population proportions, as in whether the number of minorities you've got in your sample is proportionate to the number in the population.  Z-tests essentially check for linearity and normality, allow some rudimentary hypothesis testing, and allow the ruling out of Type I and Type II error.

F-tests
F-tests are much more powerful, as they allow explanation of variance in one variable accounted for by variance in another variable. In this sense, they are very much like the coefficient of determination. One really needs a full-fledged statistics course to gain an understanding of F-tests, so suffice it to say here that you find them most commonly with regression and ANOVA techniques. F-tests require interpretation by using a table of critical values.

T-TESTS
T-tests are kind of like little F-tests, and similar to Ztests. It's appropriate for smaller samples, and relatively easy to interpret since any calculated t over 2.0 is, by rule of thumb, significant. T-tests can be used for one sample, two samples, one tail, or two-tailed. You use a two-tailed test if there's any possibility of bidirectionality in the relationship between your variables. The formula for the t-test is as follows:

ANOVA
Analysis of Variance (ANOVA) is a data analytic technique based on the idea of comparing explained variance with unexplained variance, kind of like a comparison of the coefficient of determination with the coefficient of alienation. It uses a rather unique computational formula which involves squaring almost every column of numbers. y What is called the Between Sum of Squares (BSS) refers to variance in one variable explained by variance in another variable, and what is called the Within Sum of Squares (WSS) refers to variance that is not explained by variance in another variable. y A F-test is then conducted on the number obtained by dividing BSS by WSS. The results are presented in what's called an ANOVA source table, which looks like the following:
y

ANOVA SOURCE TABLE


Source SS df MS F p

Total

2800

Between

1800

1800

10.80

<.05

Within

1000

166.67

CHI-SQUARE
A technique designed for less than interval level data is chi-square (pronounced kye- square), and the most common forms of it are the chi-square test for contingency and the chi-square test for independence. Other varieties exist, such as Cramer's V, Proportional Reduction in Error (PRE) statistics, Yule's Q, and Phi. Essentially, all chi-square type tests involve arranging the frequency distribution of the data in what is called a contingency table of rows and columns. Marginals, which are estimates of error in predicting concordant pairs in the rows and columns (based on the null hypothesis), are then computed, subtracted from one another, and expressed in the form of a ratio, or contingency coefficient. Predicted scores based on the null hypothesis are called expected frequencies, and these are subtracted from observed frequencies (Observed minus Expected). Chi-square tests are frequently seen in the literature, and can be easily done by hand, or are run by computers automatically whenever a contingency table is asked for.

Submitted to :

You might also like