You are on page 1of 9

Sampling Errors: Data-fit and

Jacknife

by John Michael Williams


jmmwill@comcast.net
2015-10-15

Examples of two methods of quantifying sampling errors are


presented.

Copyright (c) 2015, John Michael Williams.


All rights reserved.

J. M. Williams

Sampling Errors: Fit & Jacknife

Preface
This is the somewhat rewritten and expanded text of an unpublished 1996 letter to the
editor of Computers in Physics magazine. I discovered it among the old, saved text files
which I created in SCO unix before fully adopting Windows for offline work in about 1995.
I switched to Linux for online work, and dropped SCO entirely, a few years later.
Although written many years ago, the content still is useful as the description of a basis
for making decisions concerning the quantification of error in data sets included in
physical problems.

J. M. Williams

Sampling Errors: Fit & Jacknife

Measuring Sampling Error


I very much liked the Mar/Apr 1996 issue of Computers in Physics, which included
several interesting examples of visualization.
Why no consideration of visualization of sampling error? If a presentation on sampling
error were included, two error measures of interest would be that of the data (a) fit itself
and (b) the fit's jacknife.

By Data Fit
One generally useful visualization of a simple data-fit by a relatively smooth function
is to plot the fit as points (not as a smooth curve) on a fairly fine grid spanning the domain
of the fit. Then, the sampling error may be expressed by averaging locally each datum
with the fit. The result, which avoids error-bars and explicit data symbols, often will be a
relatively smooth curve (surface), plotted as discrete points, with "jogs" where deviant
data were fit, and with no specific point plotted where the fit was close. An example of
this approach was used by Williams and Lit in Vision Research, 23, Figs. 6-8, p. 175
(1983).
A simpler context was presented in Signal Detection Theory Basics (SDTB), written by
the present author and posted online at the Scribd site, at
https://www.scribd.com/doc/101998758/Signal-Detection-Theory-Basics. The data
below are described as modelling a noise distribution versus a signal-plus-noise
distribution in the response of the human retina to weak flashes of illumination.
Starting with SDTB (21) =

1 = 5

n=0

we arrive at SDTB Table III (partial):

e a
= 0.5 ,
n!

(1)

Pn(n);
0 = 3.77

Ps+n(n);
1 = 5.65

0
1
2
3
4
5
6
7
8
9
10

.0231
.0869
.1638
.2059
.1940
.1463
.0919
.0435
.0233
.0098
.0037

.0035
.0199
.0561
.1057
.1438
.1688
.1589
.1283
.0906
.0569
.0321

(2)

These are the averaged data. In Table (2) above, Pn means probability of a noise
detection for a number of events n; Ps+n means probability of a signal plus noise detection.
The reader will notice that Pn is greater than Ps+n until a sample size of n=5 or more is

Sampling Errors: Fit & Jacknife

J. M. Williams

Probability of n Events

achieved. The details of the sampling procedure need not concern us here; we merely
note that the data of the table may be plotted as shown below:

Pn(n); 0 = 3.77

0.2

Ps+n(n); 1 = 5.65

0.1

9 10

Number of Events n
Figure 1. The Poisson distributed data of Table II above.
Typical data-fit curves drawn for the averaged data of Figure 1 are shown below in Figure
2.

Sampling Errors: Fit & Jacknife

Probability of n Events

J. M. Williams

Pn(n);

0.2

0 = 3.77

Ps+n(n); 1 = 5.65

0.1

10

Number of Events n
Figure 2. Simple fit (by eye) of the data of Figure 1 above.

By Jacknife
The jacknife is a trick used by statisticians to examine the effect of individual data on
the display of a curve fit or on an inferential statistic: From a sample set of data, a new
set of samples is drawn, one for each datum, with one, jacknifed datum cut out, thus
forming a new sample. By visualizing the fit with each datum successively removed, the
effect of outliers may be quantified. See en.wikipedia.org/wiki/Jackknife_resampling for
more arithmetic details.
Before proceeding further, it should be emphasized that jacknifing is primarily an
investigative tool. Jacknifed data never should be presented as part of a publication,
unless its presence is explained at least in a footnote to the data display which contains it.
The benefit of jacknifing primarily is to identify a majority subset of the data which can
be fit accurately to a theoretical line or curve: If the jacknifed curve fit is found to be
satisfactory, the sampling can be repeated to obtain a cleaned data set. In many cases,
sampling can not be repeated conveniently; so, the fit to the jacknifed data subset may be
overlaid on the original, unjacknifed data, again explaining what was done.

J. M. Williams

Sampling Errors: Fit & Jacknife

The Figure 2 data above are not especially interesting to jacknife; so, here are a few
examples which are based on the Figure 2 "X" data, with datum 8 exaggerated:

Figure 3: Figure 2 1, modified at datum 8.

Figure 4: The Figure 3 artificial data jacknifed at Event 1.

J. M. Williams

Sampling Errors: Fit & Jacknife

Figure 5: The Figure 3 artificial data jacknifed at Event 4.

Figure 6: The Figure 3 artificial data jacknifed at Event 8.

J. M. Williams

Sampling Errors: Fit & Jacknife

Figure 7: All three Figure 4 - 6 jacknifed data removed.


Clearly, the Figure 7 data could be fit very closely to a simpler theoretical curve than
could the data of Figures 4 - 6.
Finally, here is how the final result of Figure 7 might appear, if published with the
original Figure 2 1 data:

Figure 8: The Figure 7 data, publishable with a final theoretical curve.

J. M. Williams

Sampling Errors: Fit & Jacknife

Conclusion
There would seem to be great opportunity for new visualizations of sampling error in
physics, particularly as distinguished from other sources of uncertainty. Simple data-fits
are quite common, but the jacknife seems to have been used only occasionally. Perhaps
the jacknife should be employed more often, especially when occasional data values are
scattered somewhat irregularly.

You might also like