Sample

Inferential Statistics
• Inferential statistics are used to draw inferences about a population from a sample.
• The goal of statistical analysis is to answer 2 questions:
• 1) Is there a significant effect/association/difference between the variables of interest ? (i.e.,
can we reject the null hypothesis?)
• 2) If there is an effect/association/difference –how big is it?
Population: All the items considered in any field of inquiry constitutes a “universe” or population.
• Finite population
• Infinite population
Sample: A representative part of population that is selected for analysis or investigation
Census Survey: When data is to be collected from each member of the population then it is known as
census survey
Sample Survey: When data is to be collected from some members of the population then it is known as
sample survey
Parameter: Numerical characteristics by which we describe population is known as parameter
Statistic: Numerical characteristics of sample given in the form of summary measure is known as
statistic
Variable: i)Random or stochastic Variable: The variable whose value cannot be fully controlled or
determined prior to observation is called random variable. It assumes different values with
some probabilities. Ex: Outcome of a coin toss.
ii) Nonrandom Variable: is one that is fully controllable or at least fully predictable. Ex: A
constant.
Continuous variable: is a variable that can assume any value on the numerical axis or part of it. Ex: time,
temperature, income, expenditure, height, weight etc
Discrete variable: is a variable that can assume some specific value on the numerical axis. Ex: number of
children in a family, number dots on a die after a toss or any binary variable
Sample Design
The way of selecting “sample” is known as the “Sample Design”.
Sample design and data collection are the most important steps of research operation. Maximum effort
is needed for data collection, particularly for economic data either from the establishments or from the
households.
Convenience Sampling
• Sometimes known as grab or opportunity sampling or accidental or haphazard sampling.
• A type of nonprobability sampling which involves the sample being drawn from that part of the
population which is close to hand. That is, readily available and convenient.
• The researcher using such a sample cannot scientifically make generalizations about the total
population from this sample because it would not be representative enough.
• For example, if the interviewer was to conduct a survey at a shopping center early in the
morning on a given day, the people that he/she could interview would be limited to those given
there at that given time, which would not represent the views of other members of society in
such an area, if the survey was to be conducted at different times of day and several times per
week.
• This type of sampling is most useful for pilot testing
Judgmental sampling or Purposive sampling
• The researcher chooses the sample based on who they think would be appropriate for the
study. This is used primarily when there is a limited number of people that have expertise in the
area being researched
QUOTA SAMPLING
The population is first segmented into mutually exclusive sub-groups, just as in stratified sampling.
Then judgment used to select subjects or units from each segment based on a specified proportion.
For example, an interviewer may be told to sample 200 females and 300 males between the age of 45
and 60.
It is this second step which makes the technique one of non-probability sampling.
In quota sampling the selection of the sample is non-random. For example interviewers might be
tempted to interview those who look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection
•Statistic is a numerical measure of a summary characteristic of a sample. e.g., could be a mean, a

difference in means or proportions or a correlation or regression coefficient
•Parameter is a numerical measure of a summary characteristic of a population.

An estimator of a population parameter is a sample statistic used to estimate or predict the population
parameter. It is a function of a random sample
Theory of Estimation
To estimate the unknowns the usual procedure is to assume that we have a random sample of size n
from the known probability distribution and use the sample data to estimate the unknown parameters.
This process is called the problem of estimation. Theory of estimation can be divided into two parts
The aim of point estimator is to use all the data and prior information for the purpose of calculating a
value that would be our best guess as to the actual or the true value of the parameter.
By making the preceding computations, we have performed the statistical procedure called “point
estimation”.
The absolute value of the difference between an unbiased point estimate and the corresponding
population parameter is called the sampling error.
Use the single value and the variance of the estimator to form an interval
A sampling distribution is the distribution of sample statistics computed on the set of all possible
random samples of size n that could be drawn from a population
econometrics means “economic measurement” Broadly defined: the study of economics using statistical
methods . Econometrics is an amalgam of economic theory, mathematical economics, Economic
statistics and mathematical statistics . econometrics represents the quantitative, mathematical laws of
economics.
Methodology of Econometrics
• 1. Statement of theory or hypothesis.
• 2. Specification of the mathematical model of the theory
• 3. Specification of the statistical, or econometric model
• 4. Obtaining the data
• 5. Estimation of the parameters of the econometric model
• 6. Hypothesis testing
• 7. Forecasting or prediction
• 8. Using the model for control or policy purposes.
Cross-sectional
Data collected at given point of time. E.g. a sample of households or firms, from each of which are a
number of variables like turnover, operating margin, market value of shares, etc., are measured.
From econometric point of view it is important that the observations consist a random sample from the
underlying population.
Time Series Data
A time series consist of observations on a variable(s) over time. Typical examples are daily share prices,
interest rates, CPI values.
An important additional feature over cross- sectional data is the ordering of the observations, which may
convey important information.
Regression analysis is concerned with the study of the dependence of one variable, the dependent
variable, on one or more other variables, the explanatory variables, with a view to estimating and/or
predicting the (population) mean or average value of the dependent variable in terms of the known or
fixed (in repeated sampling) values of the independent variable.
In correlation analysis, the primary objective is to measure the strength or degree of linear association
between two variables. Instead, in Regression analysis we try to estimate or predict the average value of
one variable on the basis of the fixed values of other variables
in Regression analysis the dependent variable is assumed to be statistical, random, or stochastic, that
is, to have a probability distribution. The explanatory variables, on the other hand, are assumed to have
fixed values (in repeated sampling), In correlation analysis, on the other hand, we treat any (two)
variables symmetrically; there is no distinction between the dependent and explanatory variables.
The Simple Regression Model
Error term is a combination of a number of effects, like:

1. Omitted variables: Accounts the effects of variables omitted from the model.
2. Nonlinearities: Captures the effects of nonlinearities between y and x. Thus, if the true model is
and we assume that it is then the effect of
is absorbed to . In fact
3. Measurement errors: Errors in measuring y and x are absorbed in .
4. Unpredictable effects: includes also inherently unpredictable random effects.
Assumption 1: Linear regression model.
Assumption 2: X values are fixed in repeated sampling, X is assumed to be non stochastic
Assumption 3: Zero mean value of disturbance
Assumption 4: equal variance of ui
Assumption 5: No autocorrelation between the disturbances
Assumption 6: Zero covariance between ui and Xi
Assumption 7: The number of observations n must be greater than the number of parameters to be
estimated
Assumption 8: Variability in X values
Interpretation of Result
Yˆ  1.67  1.50 X
Each point on the regression line gives an estimate of the expected or mean value of Y corresponding to
the chosen X value; that is, Yi is an estimate of E(Y|Xi). The value of ˆβ2 = 1.50, which measures the slope
of the line, shows that, within the sample range of X between 1and 3, as X increases, say, by 1, the
estimated increase in the mean or average of Y is to about 1.5 unit.
The value of ˆβ1 = 1.67, which is the intercept of the line, indicates the average level of Y when X is zero.
In regression analysis such literal interpretation of the intercept term may not be always meaningful.
Perhaps it is best to interpret the intercept term as the mean or average effect on Y of all the variables
omitted from the regression model.
Goodness of fit of the fitted regression line
The goodness of fit of the fitted regression line to a set of data is to find out how “well” the sample
regression line fits the data.
The overall goodness of fit of the regression model is measured by the coefficient of determination r2
(two-variable case) or R2 (multiple regression). It tells what proportion of the variation in the dependent
variable, or regressand, is explained by the explanatory variable, or regressor.
R2 lies between 0 and 1; the closer it is to 1, the better is the fit.
r2 defined previously can also be computed as the squared coefficient of correlation between actual Yi
and the estimated Yi ,
Consider the result of consumption-income relationship
As the regression results show, there is a positive association between income and consumption.
The marginal propensity to consume (MPC) is about 0.71, suggesting that if income goes up by a dollar,
the average personal consumption expenditure (PCE) goes up by about 71 cents.
The intercept value of about
−184 tells us that if income were zero, the PCE would be about −184 billion dollars. The intercept term is
negative but it may have no economic meaning.
The r2 value of 0.9984 means approximately 99 percent of the variation in the PCE is explained by
variation in the GDP. Since r2 at most can be 1, we can say that the regression line, fits our data
extremely well
Degrees of Freedom: The term number of degrees of freedom means the total number of
observations in the sample (= n) less the number of independent (linear) constraints or restrictions put
on them.
In other words, it is the number of independent observations out of a total of n observations. For
example, before the RSS can be computed, ˆβ1 and ˆβ2 must first be obtained. These two estimates
therefore put two restrictions on the RSS. Therefore, there are n− 2, not n, independent observations to
compute the RSS. Following this logic, in the three-variable regression RSS will have n− 3 df, and for the
k-variable model it will have n− k df. The general rule is this: df = (n− number of parameters estimated)
p value: is the probability value, also known as the observed or exact level of significance or the exact
probability of committing a Type I error. the p value is defined as the lowest significance level at which a
null hypothesis can be rejected.
Hypothesis: Is an unproven statement about a factor that is of interest to the researcher.

• Often hypothesis is the possible answer to the research question
• An important role of hypothesis is to suggest variable to be included in the model
Null Hypothesis:
 Is a statement in which no difference or effect is expected. If null hypothesis is not
rejected no change will be made.
 It is always the hypothesis that is to be tested
 Refers to the specific value of population parameter
 It is formulated in such a way that its rejection leads to the acceptance of the desire
conclusion
Alternative Hypothesis:
 Is a statement in which some differences or effects is expected. Accepting it leads to
change in opinion or action.
 It represents the conclusion for which evidence is sought
Level of significance: When we draw inferences about a population parameter, there is a risk that an
incorrect conclusion will be reached. Two types of error may occur:
 Type I error: Occurs when the sample result leads to rejection of null hypothesis in fact it is true.
Level of significance is the probability of committing type I error. The appropriate level of
significance depends on the cost of making type I error.
 Type II error: Occurs if null hypothesis is not rejected when it is in fact false.
Steps in Simple Linear Regression Analysis

- Plot the Scatter Diagram
- Formulate the Regression Model
- Estimate the parameters
- Determine the strength and significance of association
- Test for Significance of Parameter
- Examination of residual
Steps in Hypothesis Testing:

 Formulate h0 h1
 Select the appropriate test statistic
 Choose the level of significance
 Calculate test statistic
 Determine probability associated with the test statistic
 Compare probability with the level of significance
 Reject or do not reject h0
 Draw conclusion

Sample

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample

Uploaded by

Copyright:

Available Formats

Inferential Statistics

Sample: A representative part of population that is selected for analysis or investigation

Parameter: Numerical characteristics by which we describe population is known as parameter

The way of selecting “sample” is known as the “Sample Design”.

• Sometimes known as grab or opportunity sampling or accidental or haphazard sampling.

• This type of sampling is most useful for pilot testing

Judgmental sampling or Purposive sampling

•Statistic is a numerical measure of a summary characteristic of a sample. e.g., could be a mean, a

•Parameter is a numerical measure of a summary characteristic of a population.

Error term is a combination of a number of effects, like:

Hypothesis: Is an unproven statement about a factor that is of interest to the researcher.

Steps in Simple Linear Regression Analysis

Steps in Hypothesis Testing:

You might also like