Professional Documents
Culture Documents
• Inferential statistics are used to draw inferences about a population from a sample.
• The goal of statistical analysis is to answer 2 questions:
• 1) Is there a significant effect/association/difference between the variables of interest ? (i.e.,
can we reject the null hypothesis?)
• 2) If there is an effect/association/difference –how big is it?
Population: All the items considered in any field of inquiry constitutes a “universe” or population.
• Finite population
• Infinite population
Census Survey: When data is to be collected from each member of the population then it is known as
census survey
Sample Survey: When data is to be collected from some members of the population then it is known as
sample survey
Statistic: Numerical characteristics of sample given in the form of summary measure is known as
statistic
Variable: i)Random or stochastic Variable: The variable whose value cannot be fully controlled or
determined prior to observation is called random variable. It assumes different values with
some probabilities. Ex: Outcome of a coin toss.
ii) Nonrandom Variable: is one that is fully controllable or at least fully predictable. Ex: A
constant.
Continuous variable: is a variable that can assume any value on the numerical axis or part of it. Ex: time,
temperature, income, expenditure, height, weight etc
Discrete variable: is a variable that can assume some specific value on the numerical axis. Ex: number of
children in a family, number dots on a die after a toss or any binary variable
Sample Design
Sample design and data collection are the most important steps of research operation. Maximum effort
is needed for data collection, particularly for economic data either from the establishments or from the
households.
Convenience Sampling
• A type of nonprobability sampling which involves the sample being drawn from that part of the
population which is close to hand. That is, readily available and convenient.
• The researcher using such a sample cannot scientifically make generalizations about the total
population from this sample because it would not be representative enough.
• For example, if the interviewer was to conduct a survey at a shopping center early in the
morning on a given day, the people that he/she could interview would be limited to those given
there at that given time, which would not represent the views of other members of society in
such an area, if the survey was to be conducted at different times of day and several times per
week.
• The researcher chooses the sample based on who they think would be appropriate for the
study. This is used primarily when there is a limited number of people that have expertise in the
area being researched
QUOTA SAMPLING
The population is first segmented into mutually exclusive sub-groups, just as in stratified sampling.
Then judgment used to select subjects or units from each segment based on a specified proportion.
For example, an interviewer may be told to sample 200 females and 300 males between the age of 45
and 60.
It is this second step which makes the technique one of non-probability sampling.
In quota sampling the selection of the sample is non-random. For example interviewers might be
tempted to interview those who look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection
Theory of Estimation
To estimate the unknowns the usual procedure is to assume that we have a random sample of size n
from the known probability distribution and use the sample data to estimate the unknown parameters.
This process is called the problem of estimation. Theory of estimation can be divided into two parts
The aim of point estimator is to use all the data and prior information for the purpose of calculating a
value that would be our best guess as to the actual or the true value of the parameter.
By making the preceding computations, we have performed the statistical procedure called “point
estimation”.
The absolute value of the difference between an unbiased point estimate and the corresponding
population parameter is called the sampling error.
Use the single value and the variance of the estimator to form an interval
A sampling distribution is the distribution of sample statistics computed on the set of all possible
random samples of size n that could be drawn from a population
econometrics means “economic measurement” Broadly defined: the study of economics using statistical
methods . Econometrics is an amalgam of economic theory, mathematical economics, Economic
statistics and mathematical statistics . econometrics represents the quantitative, mathematical laws of
economics.
Methodology of Econometrics
• 1. Statement of theory or hypothesis.
• 2. Specification of the mathematical model of the theory
• 3. Specification of the statistical, or econometric model
• 4. Obtaining the data
• 5. Estimation of the parameters of the econometric model
• 6. Hypothesis testing
• 7. Forecasting or prediction
• 8. Using the model for control or policy purposes.
Cross-sectional
Data collected at given point of time. E.g. a sample of households or firms, from each of which are a
number of variables like turnover, operating margin, market value of shares, etc., are measured.
From econometric point of view it is important that the observations consist a random sample from the
underlying population.
Time Series Data
A time series consist of observations on a variable(s) over time. Typical examples are daily share prices,
interest rates, CPI values.
An important additional feature over cross- sectional data is the ordering of the observations, which may
convey important information.
Regression analysis is concerned with the study of the dependence of one variable, the dependent
variable, on one or more other variables, the explanatory variables, with a view to estimating and/or
predicting the (population) mean or average value of the dependent variable in terms of the known or
fixed (in repeated sampling) values of the independent variable.
In correlation analysis, the primary objective is to measure the strength or degree of linear association
between two variables. Instead, in Regression analysis we try to estimate or predict the average value of
one variable on the basis of the fixed values of other variables
in Regression analysis the dependent variable is assumed to be statistical, random, or stochastic, that
is, to have a probability distribution. The explanatory variables, on the other hand, are assumed to have
fixed values (in repeated sampling), In correlation analysis, on the other hand, we treat any (two)
variables symmetrically; there is no distinction between the dependent and explanatory variables.
The Simple Regression Model
r2 defined previously can also be computed as the squared coefficient of correlation between actual Yi
and the estimated Yi ,
Consider the result of consumption-income relationship
As the regression results show, there is a positive association between income and consumption.
The marginal propensity to consume (MPC) is about 0.71, suggesting that if income goes up by a dollar,
the average personal consumption expenditure (PCE) goes up by about 71 cents.
The intercept value of about
−184 tells us that if income were zero, the PCE would be about −184 billion dollars. The intercept term is
negative but it may have no economic meaning.
The r2 value of 0.9984 means approximately 99 percent of the variation in the PCE is explained by
variation in the GDP. Since r2 at most can be 1, we can say that the regression line, fits our data
extremely well
Degrees of Freedom: The term number of degrees of freedom means the total number of
observations in the sample (= n) less the number of independent (linear) constraints or restrictions put
on them.
In other words, it is the number of independent observations out of a total of n observations. For
example, before the RSS can be computed, ˆβ1 and ˆβ2 must first be obtained. These two estimates
therefore put two restrictions on the RSS. Therefore, there are n− 2, not n, independent observations to
compute the RSS. Following this logic, in the three-variable regression RSS will have n− 3 df, and for the
k-variable model it will have n− k df. The general rule is this: df = (n− number of parameters estimated)
p value: is the probability value, also known as the observed or exact level of significance or the exact
probability of committing a Type I error. the p value is defined as the lowest significance level at which a
null hypothesis can be rejected.