You are on page 1of 5

SRM ASSIGNEMENT

1. T-Test
Conditions under which it is used: - A t-test is an analysis of two populations’ means through
the use of statistical examination; a t-test with two samples is commonly used with small
sample sizes, testing the difference between the samples when the variances of two normal
distributions are not known.
A t-test looks at the t-statistic, the t-distribution and degrees of freedom to determine the
probability of difference between populations; the test statistic in the test is known as the t-
statistic. To conduct a test with three or more variables, an analysis of variance (ANOVA) must
be used.

Example: An expert needs to think about the sum that Pennsylvanians and Californians spend,
every month, on dress. In this manner, an example of ways of managing money is taken from
a chosen gathering of people from each state. The gathering might be of any little to direct size
— for this case, accept that the example aggregate is 200 people.

The normal sum for Pennsylvanians turns out to $500; the normal sum for Californians is
$1,000. The t-test questions whether the distinctive between the gatherings is illustrative of a
genuine contrast between individuals in Pennsylvania and individuals in California all in all or
on the off chance that it is likely a good for nothing factual distinction. In this illustration,
assuming, hypothetically, all Pennsylvanians burned through $500 every month on garments
and all Californians burned through $1,000 every month on garments, it is exceptionally
impossible that 200 arbitrarily chose people all spent that correct sum, separate to state. Along
these lines, if an examiner or analyst yielded the outcomes recorded in the case above, it is
protected to presume that the contrast between test bunches is demonstrative of a huge
distinction between the populaces, all in all, of each state.

2. Chi-Square Test
Chi-square test is one of the important nonparametric tests that is used to compare more than
two variables for a randomly selected data. The expected frequencies are calculated based on
the conditions of null hypothesis. The rejection of null hypothesis is based on the differences
of actual value and expected value.

The data can be examined by using the two types of Chi-square test, which is given below:

1. Chi-square goodness of fit test

It is used to observe that the closeness of a sample matches a population. The Chi-square test
statistic is,

With k-1 degrees of freedom. Where Oi is the observed count, k is categories, and Ei is the
expected counts.
SRM ASSIGNEMENT
2. Chi-square test for independence of two variables

It is used to check whether the variables are independent of each other or not. The Chi-square
test statistic is,

With degrees of freedom. Where Oi is the observed count, r is number of


rows, c is the number of columns, and Ei is the expected counts.

Example: Is gender independent of education level? A random sample of 395 people were
surveyed and each person was asked to report the highest education level they obtained. The
data that resulted from the survey is summarized in the following table:

Are gender and education level dependent at 5% level of significance? In other words, given
the data collected above, is there a relationship between the gender of an individual and the
level of education that they have obtained? So, working this out, χ2= (60−50.886)2/50.886+⋯+
(57−48.132)2/48.132=8.006. The critical value of χ2 with 3 degrees of freedom is 7.815. Since
8.006 > 7.815, therefore we reject the null hypothesis and conclude that the education level
depends on gender at a 5% level of significance.

3. Z-Test
A z-test is a statistical test used to determine whether two population means are different when
the variances are known and the sample size is large. The test statistic is assumed to have
a normal distribution, and nuisance parameters such as standard deviation should be known for
an accurate z-test to be performed. The z-test is a hypothesis test in which the z-statistic follows
a normal distribution. The z-test is best used for greater than 30 samples because, under the
central limit theorem, as the number of samples gets larger, the samples are considered to be
approximately normally distributed. When conducting a z-test, the null and alternative
hypotheses, alpha and z-score should be stated. Next, the test statistic should be calculated, and
the results and conclusion stated.
Example: an investor wishes to test whether the average daily return of a stock is greater than
1%. A simple random sample of 50 returns is calculated and has an average of 2%. Assume
the standard deviation of the returns is 2.50%. Therefore, the null hypothesis is when the
average, or mean, is equal to 3%. Conversely, the alternative hypothesis is whether the mean
return is greater than 3%. Assume an alpha of 0.05% is selected with a two-tailed test.
Consequently, there is 0.025% of the samples in each tail, and the alpha has a critical value of
1.96 or -1.96. If the value of z is greater than 1.96 or less than -1.96, the null hypothesis is
rejected.
SRM ASSIGNEMENT
The value for z is calculated by subtracting the value of the average daily return selected for
the test, or 1% in this case, from the observed average of the samples. Next, divide the resulting
value by the standard deviation divided by the square root of the number of observed values.
Therefore, the test statistic is calculated to be 2.83, or (0.02 - 0.01) / (0.025 / (50) ^ (1/2)). The
investor rejects the null hypothesis since z is greater than 1.96 and concludes the average daily
return is greater than 1%.

4. ANOVA
Analysis of variance (ANOVA) is a collection of statistical models and their associated
procedures (such as "variation" among and between groups) used to analyse the differences
among group means. ANOVA was developed by statistician and evolutionary
biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable
is partitioned into components attributable to different sources of variation. In its simplest form,
ANOVA provides a statistical test of whether or not the means of several groups are equal, and
therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing
(testing) three or more means (groups or variables) for statistical significance. It is conceptually
similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and
is therefore suited to a wide range of practical problems.
Example: One-way ANOVA can be used for studying the effects of tea on weight loss and
form three groups: green tea, black tea, and no tea. A two-way ANOVA allows a company to
compare worker productivity based on two independent variables, say salary and skill set. It is
utilized to observe the interaction between the two factors. It tests the effect of two factors at
the same time.

5. Correlation Analysis
Is utilized to portray how much one variable is directly identified with another and decides the
quality of the connection between two numerically estimated constant factors. Positive
relationship exists in the event that one variable increments at the same time with the other
while, Negative connection exists on the off chance that one variable declines when alternate
increments. Pearson's item minute coefficient is the estimation of relationship and reaches
(contingent upon the connection) amongst +1 and - 1. +1 shows the most grounded positive
connection conceivable, and - 1 demonstrates the most grounded negative relationship
conceivable. Accordingly, the nearer the coefficient to both of these numbers the more
grounded the relationship of the information it speaks to. 0 shows no connection, thus esteems
more like zero feature weaker/poorer relationship than those more like +1/ - 1. For the Pearson
r relationship, the two factors ought to be regularly disseminated. Different presumptions
incorporate linearity and homoscedasticity.
Example: Is there a connection between work fulfilment, as estimated by the JSS, and pay,
estimated in dollars or, is there a connection amongst joblessness and wrongdoing rates.

6. Regression-Simple
Used to clarify the connection between one ward variable and another free factor. The easiest
type of relapse condition is characterized by the recipe y=mx+C. To start with, the relapse may
SRM ASSIGNEMENT
be utilized to distinguish the quality of the impact that the free variable(s) have on a needy
variable. Second, the relapse investigation causes us to see how much the needy variable
changes with an adjustment in at least one free factors. Third, relapse examination predicts
patterns and future esteems. In straightforward direct relapse the forecasts of Y when plotted
as an element of X shape a straight line.
Example: A year ago, five arbitrarily chose understudies took a math fitness test before they
started their measurements course. The Statistics Department has the accompanying inquiries.
a) What straight relapse condition best predicts insights execution, in view of math fitness
scores? b) If an understudy influenced an 80 on the inclination to test, what review would we
anticipate that she will make in insights.

7. Regression-Multiple
Multiple linear regression requires at least two independent variables, which can be nominal,
ordinal, or interval/ratio level variables. It works on certain assumptions namely, a) there must
be a linear relationship between the dependent and independent variables, b) the errors between
the observed and predicted values should be normally distributed, c) there is no
multicollinearity in the data and d) homoscedastic. Further, the dependent variables must be
measured on a continuous scale, there are two or more independent variables that can be either
continuous or categorical, and there should be no significant outliers, high leverage points or
highly influential points.
Example: A land specialist may record for each posting the extent of the house (in square feet),
the quantity of rooms, the normal pay in the particular neighbourhood as per evaluation
information, and a subjective rating of interest of the house. When this data has been aggregated
for different houses it should be perceived how these measures identify with the cost for which
a house is sold. For instance, one should think about the quantity of rooms to be a superior
indicator of the cost for which a house offers in a specific neighbourhood than how "beautiful"
the house is (subjective rating). One may likewise identify "exceptions," that is, houses that
should offer for more, given their area and qualities

8. Factor Analysis
It is utilized to diminish countless into less quantities of components. This strategy removes
most extreme basic change from all factors and places them into a typical score. As a list of all
factors, this score can be utilized for facilitate examination. This strategy is additionally in light
of the accompanying suppositions, to be specific, direct relationship, no multicollinearity,
incorporates pertinent factors into investigation, and there is genuine connection amongst
factors and factors. In each factor examination, there are an indistinguishable number of
elements from there are factors. Each factor catches a specific measure of the general
fluctuation in the watched factors, and the elements are constantly recorded arranged by how
much variety they clarify.
Example: Suppose we need to build up a test that will enable an organization to choose for
candidates that are great colleagues. How might we go about it? Suppose an analyst leads an
SRM ASSIGNEMENT
exploratory factor examination on the organization's prerequisites and finds 20 unique angles
or attributes that make a decent colleague (for instance "sympathy" and "neighbourliness").
Additionally factor investigation and testing on little examples uncovers, in any case, that every
one of the 20 angles are only the appearances of only three primary elements: relational
abilities, honesty and extroversion. The therapist can lead additionally adjusts of factor
investigation, testing and refinement to discover answers to two fundamental inquiries:
• What is the base number of components expected to clarify all the variety we find in
the organization's information?
• How well do these components depict every one of the information?
In the long run the therapist can land at the primary shrouded factors in the information and
plan the stock in like manner

You might also like