You are on page 1of 2

Name: Nihal R.

Dalvi
Roll No: 08

Assignment No. 2
(Pitfalls of Data Analysis)

The Problem with Statistics


We have a pervasive notion that we can prove anything with statistics which is only true when
we use them improperly. "Lies, damned lies, and statistics" is a phrase describing the persuasive
power of numbers, particularly the use of statistics to bolster weak arguments. It is also
sometimes colloquially used to doubt statistics used to prove an opponent's point.

Sources of Bias
Bias is the tendency of a statistic to overestimate or underestimate a parameter.
Representative Sampling: In this, the ideal scenario would be where the sample is chosen by
selecting members of the population at random, with each member having an equal probability
of being selected for the sample. Thus randomness is again a source of bias.

Statistical Assumptions: In this, if the sample distribution is non-normal, we apply a


transformation. However, this has dangers as well; an ill-considered transformation can do more
harm than good in terms of interpretability of results.

Errors in Methodology
Statistical Power: The power of any test of statistical significance is defined as the probability
that it will reject a false null hypothesis. Statistical power is inversely related to beta or the
probability of making a Type II error. In short, power = 1 – β. Statistical power is affected
chiefly by the size of the effect and the size of the sample used to detect it. Bigger effects are
easier to detect than smaller effects, while large samples offer greater test sensitivity than small
samples.

Multiple Comparisons: In statistics, the multiple comparisons occur when one considers a set
of statistical inferences simultaneously or infers a subset of parameters selected based on the
observed values. In certain fields it is known as the look-elsewhere effect. Multiple comparisons
arise when a statistical analysis involves multiple statistical tests, each of which has a potential to
produce a "discovery." The more inferences are made, the more likely erroneous inferences are
to occur.

Measurement Error: Measurement Error is the difference between a measured quantity and its
true value. It includes random error (naturally occurring errors that are to be expected with any
experiment) and systematic error (caused by a mis-calibrated instrument that affects all
measurements. Two characteristics of measurement which are particularly important in
psychological measurement are reliability and validity. Reliability refers to the ability of a

Advanced Business Analytics – II Assignment 1


Name: Nihal R. Dalvi
Roll No: 08

measurement instrument to measure the same thing each time it is used and Validity is the extent
to which the indicator measures the thing it was designed to measure. Measurement errors can
quickly grow in size when used in formulas. To account for this, we should use a formula
for error propagation whenever we use uncertain measures in an experiment to calculate
something else.

Problems with interpretation


Confusion over significance: A reasonable way to handle this sort of thing is to cast results in
terms of effect sizes. By doing so, the size of the effect is presented in terms that make
quantitative sense. A p-value merely indicates the probability of a particular set of data being
generated by the null model and has little to say about size of a deviation from that model.

Precision and Accuracy: Accuracy refers to the closeness of a measured value to a standard or
known value. Precision refers to the closeness of two or more measurements to each other. A
measurement system can be accurate but not precise, precise but not accurate, neither, or both
and is considered valid if it is both accurate and precise.

Causality: Causality is the natural or worldly agency or efficacy that connects one process
(the cause) with another process or state (the effect), where the first is partly responsible for the
second, and the second is partly dependent on the first. Statistics and economics usually employ
pre-existing data or experimental data to infer causality by regression methods. The bottom line
on causal inference is that we must have random assignment.

Graphical Representations: In this, the Lie Factor is the ratio of the difference in the proportion
of the graphic elements versus the difference in the quantities they represent. The most
informative graphics are those with a Lie Factor of 1. One more element is that the changes in
the scale of the graphic should always correspond to changes in the data being represented.
Another trouble spot with graphs is multidimensional variation. This occurs where two-
dimensional figures are used to represent one-dimensional values.

Advanced Business Analytics – II Assignment 2

You might also like