Professional Documents
Culture Documents
Research Question
A research question must be an interesting puzzle which can be understood through some
theory and accepts falsifiable hypotheses to explain the phenomenon.
2. What kind of data/methodology will allow you to answer it?
Even using perfect data or being able to observed and accurately measured all the data for
the whole population statistical analysis has some limitations. Some limitations come from inherent
assumptions in the processes while others will be consequence of noise in reality. Some tools to
analysis effects of certain factors are descriptive or comparative statistics, modeling, experimental,
in vitro analysis, regression analysis, agent-based modeling, spacial analysis, etc. These methods
are the quantitative part of possible analysis as opposed to qualitative analysis.
Regression analysis allows for hypothesis testing, i.e., test assumption against empirical data
and provide an indicator of the likelihood the assumption is true given the observable outcomes
such as a p-value. Given the nature of the analysis and importance of certainty the researcher can
choose a level at which to reject the null hypothesis therefore being able to reject null hypothesis if
the significance level provided by the data is above said threshold. Lastly, consideration is given to
the practical significance which calls into consideration effect size in order to judge whether the
possible variation in the outcome predicted by the factor is relevant within the context. Relevance
can be a consideration of the relative possible deviation in the outcome given how manipulable the
explanatory variable can be.
3. Regression analysis (Standard Econometrics)
Regression analysis is a set of tools that allow for hypotheses testing through comparative statics.
The basic idea is to use an economic model or mathematical equations as a base to fit data given
said structure and uncover estimates for the variables that can best solve the problem. These
estimators represent the value that best fit the data given the specifications and operator employed.
These estimates with their associated certainty and effect size considerations can be used for
hypotheses testing, prediction, and other purposes such as strategy design. Generally, these
applications come from the analysis at mean values and at ceterus paribus, i.e., where we would
observed it most of the time and when everything else is held constant.
Theory and empirical analysis are both compliments as it is borderline absurd to do one
without the other. Theory is the underlying description of how something works. Any strategy or
policy that is based on science follows some understanding that comes from theory. Likewise,
theory until it has been supplemented and revised through empirical analysis can and should not be
applied unless necessary as it carries high uncertainty. As it is in science, theory is not to be taken as
true, but carries its worth on it practical usefulness and resistance to have been disproved. Theory
which has not been tested cannot be fails in credentials regardless of its practical usefulness.
4. Theory
A theory must be a proposed explanation of a phenomenon observed or believed to be
possible. It must be of practical usefulness and it must be falsifiable. Goods theories also must work
given a domain which is set by working assumptions. The mechanism or processes that make up the
theory must also be consistent with logic or observable behavior. Lastly, the best theories also allow
for expansions of the theoretical realm to a model that can be used to better understand the
workings of the phenomenon and operationalized its arguments with measurable indicators.
Model Specification for Empirical Analysis
A good model operationalizes the theory in ways that allow to test its validity through
empirical analysis. In the case of regression analysis that includes to protect the economic model
from omitted variables. A good economic model must have present all relevant factors according to
the theory including an error term if the model is stochastic and not deterministic. The equation or
series of equations used in the economic model must also have the correct nature. For example,
linear regressions where there is a single outcome variable as opposed to non-linear regression
models.
5. Data and Methodology
The data and methodology depends on your economic model, but are also constrained by
what data is both attainable and feasible. The methodology employed will most likely be a function
of the data which in turn is influenced by both the research question, theory, and chosen model. In
practice, the ultimate decision will depend on all factors and not necessarily in the order proposed.
Two aspects that largely influence the methodology are going to be the previous approaches to the
issue in the literature and the quality and nature of the data.
6. Least Squares Method (Most used linear regression approach)
Least Squares method is a technique in which equations are modeled with a single outcome
variable on the left and some right side parameters. The most common form is a multiple linear
regression with an intercept coefficient,
(Equation 1)
where y is the outcome variable (which by theory is a dependent variable), 0 is an intercept
which best fits the data when all variables take on the value of zero, 1 is the sub-vector containing
the coefficient for the variables of interest which are denoted V.I., the 2 is the remaining sub-vector
which includes the coefficients for the remaining explanatory variables denoted X, and lastly, u is
the stochastic normally distributed error term by assumption
Once the model is estimated the values theoretical values become data-estimated
parameters,
(Equation 2)
However, the full potential of the OLS method is when the Gauss-Markov Assumptions are
met and through the Gauss-Markov Theorem it can be proved that the estimates are both efficient
and unbiased, i.e., the Best Linear Unbiased Estimates (BLUE). The five Gauss-Markov
Assumptions are:
a) Linearity in the functional form
It follows that the distribution of the error term has mean zero (
explanatory variables are not correlated with the error term (
) and the
).
d) No Perfect Collinearity
The explanatory variables are linearly independent.
X which denotes the design matrix (explanatory variables) n x k for n observations
and k variables is full rank.
e) No simultaneity
The outcome variable is determined by the right-side and these are not influenced by
the outcome variable.
Given these assumptions the derived estimator is
However, there are technical considerations as well as violations of the assumption on reality.
Omitted variable bias: does not control for a variables that influences the outcome
variable.
Solution: Refer to the theory and test possible variables to verify robustness of
results. If the results change, there is evidence that the model suffers from omitted
variable bias and should be included in order to fix it.
What it does: Potentially contaminate your results or could inflate the explanatory
power of the model.
Consequence: It does not represents the actual analysis and can potentially alter the
estimates as it attempts to fit the data to noise.
Test: Use model comparisons and calculate AIC / BIC measures to account judge the
added value of controlling for certain variable.
In Stata: estat ic, also useful saving estimates and presenting these on tables.
What it does: Explanatory variables are correlated with the error term.
In Stata: ivregress
Data considerations
Test: Graph the residuals, White's Test for heteroscedasticity (avoid BP LM-test)
In Stata: whitetst
Use theory and visual interpretation or (GQ) test to identify possible culprits
and causes.
Check whether the sample is homogenous and if not run separated restricted
regressions.
Autocorrelation
Solutions
Technical Considerations
Multicollinearity
Solve:
Specify and code variables such that the explanatory variables are linearly
independent and not too strongly correlated.
Sample size
Enough observations.
Variation on variables