You are on page 1of 16

All Models are Wrong: The Problem of Undersampling for Models of Archaeological

Occupations and Its Consequences for Significance Determinations


by Jeffrey S. Alvey, Doctoral Candidate, University of Missouri Columbia
Paper presented at the 2015 Southeastern Archaeological Conference, Nashville,
Tennessee.
Abstract: Southeastern archaeologists routinely employ shovel testing as a method for
site discovery and delineation, and as a means of collecting information on the kinds and
numbers of artifacts and features present at a site. This sampling strategy is employed in
the context of Section 106 compliance, as well as in academic research. This paper
presents findings on the relationship between shovel-testing strategies and the accuracy
and usefulness of the models of archaeological occupations that result from the
information collected during shovel testing. These results suggest that some common
approaches to shovel testing lead to faulty models that fail to accurately represent
important occupational attributes, thus compromising our ability to make valid
significance determinations.
Introduction
Models are of central importance in science and efforts to build, test, compare,
and revise them comprise a major component of that enterprise. Archaeologists often
employ models in the study of archaeological phenomena, though we often fail to
recognize or describe the process as such. I believe that much can be gained through the
recognition that despite the empirical nature of the archaeological record, we inevitably
employ ideational constructs when characterizing that record and the processes that
shaped it (Dunnell 1971). For example, while the artifactual remains used to define
archaeological occupations are empirical phenomena, the resulting occupational models
are representations of the actual settlement that deposited those materials. As
representations, they are incomplete. For all practical purposes, factors such as
preservation or sampling bias insure this incompleteness in all cases. By accepting this
inevitable quality of archaeological inquiry, the appropriateness of the role of models in
archaeology becomes apparent. Accepting that models are by necessity provisional
encourages us to employ multiple and independent tests, which through time leads either
to increasing or decreasing confidence in a model. As additional data is brought to bear,
the model may be revised, rejected or replaced by a more useful model.
In their book Empirical Model-Building and Response Surfaces statisticians
George Box and Norman Draper encourage us to Remember that all models are wrong
and notes that the practical question is how wrong do they have to be to not be useful
(1987:74). This statement provides a useful aphorism that highlights an important point.
As researchers, we tend to overemphasize the results of particular models, and
underemphasize the processes that lead to the construction of our models and that affect
their ultimate usefulness. Thus, in the case of models of archaeological occupations, we
should acknowledge their incompleteness and ask whether they are useful in
accomplishing the task or tasks for which we construct them. In this paper I am asking
1

that question of models of archaeological occupations used in the context of cultural


resource management. More specifically, I am concerned with the question of how the
methods we use to construct models of archaeological occupations during Phase I survey
affects our determination of a sites eligibility for inclusion on the National Register.
The Role of Occupational Models in Cultural Resource Management
As part of the Section 106 compliance process, Phase I cultural resource surveys
employ various kinds of sampling strategies to identify and investigate cultural resources
within an area of potential effect. These efforts are undertaken for the ultimate purpose
of assessing the significance of those resources in light of the National Register of
Historic Places criteria for evaluation. Although Criteria A, B, C, and D may all be
relevant to the evaluation of archaeological sites, it is Criterion D that is most often cited
when arguing for the eligibility of archaeological sites. Criterion D refers to sites, that
have yielded or may be likely to yield, information important in history or prehistory.
Unlike architectural resources, for example, much of what makes an archaeological site
significant is buried within the soil and effectively hidden from the researcher. Only
through the use of sub-surface sampling methods can archaeologists recover the
information needed to determine whether a site may be likely to yield information
important to the regions history or prehistory.
Central to any sampling strategy is the need for its results to sufficiently represent
the population of interest so that the research questions being asked of that population
may be accurately and effectively answered with the resulting sample. In the context of
Phase I cultural resource surveys it is the material remains left behind during the
prehistoric or historic settlement of a location that represents our population of interest.
By sampling these materials we hope to construct models of archaeological occupations
that accurately and effectively represent the attributes of non-extant settlements. Only by
accomplishing this task can we hope to assess that sites significance for advancing our
understanding of history or prehistory.
In Mississippi, as in most regions of the Southeast, shovel testing is routinely
employed as a method for site discovery and delineation, and as a means of collecting
information on the kinds and numbers of artifacts and features present at a site; however,
the details of how shovel testing is employed for these purposes varies from state to state.
For example, state guidelines vary in their requirements for spacing between shovel tests
and transects, the horizontal and vertical dimensions of shovel tests, or how shovel
testing should vary for the purposes of site discovery versus site delineation. Shoveltesting strategies as developed in state guidelines are critically important in cultural
resource management as they affect not only success in the discovery of previously
uninvestigated resources, but ultimately determine our perceptions of sites and their
attributes, and our resulting determinations of significance. Despite this importance, little
research has focused on the processes involved in constructing occupational models
during Phase I investigations or assessments of whether the resulting models effectively
serve the purposes for which they were constructed.

Assessing the Construction of Occupational Models


Following on Dunnells (1971:151) definition, an occupation is a spatial cluster
of discrete objects which can reasonably be assumed to be the product of a single group
of people at a particular locality deposited over a period of continuous residence
comparable to other such units in the same study. An important point in this conception
of occupations is that these units are based on historical connections between deposition
events and not solely on spatial proximity (Dunnell 1992). However, factors such as
preservation and sampling biases, and the fact that in most studies only a small portion of
a site is excavated, make it unlikely that the entire cluster of discrete objects associated
with an occupation will be recovered. Thus, only a sample of the occupation contributes
to the construction of occupational models. Despite this incompleteness, however,
occupational models operate as central analytical units in archaeological studies as they
serve to represent the spatial, temporal, and formal attributes of some archaeological
locus.
For the remainder of this paper I will discuss a series of experiments conducted to
assess standard practice in Mississippi for the delineation and investigation of sites during
Phase I survey as set forth in the Mississippi SHPOs Guidelines for Archaeological
Investigations and Reports in Mississippi (Sims 2001). I will present evidence that a
section of the guidelines pertaining to the required sampling strategies for site delineation
and investigation is inadequate as it promotes undersampling during Phase I
investigations. This problem leads to faulty occupational models that poorly represent
the occupations under investigation. The Mississippi guidelines state:
When a positive shovel test is excavated, the testing interval should be reduced to
5 to 10 m with shovel testing continuing in a cruciform or grid pattern until two
consecutive negative shovel tests are encountered (Sims 2001:13).
While this guideline allows for shovel testing on either a cruciform or grid
pattern, consultants predominately adopt the cruciform pattern as this strategy requires
considerably less time than shovel testing on a grid pattern. This is an important point
considering that consultants must operate within a competitive-bid environment, which
rewards strategies that reduce costs and make organizations more competitive in the
marketplace. Sampling strategies at the Phase I level should serve to accurately delineate
a sites artifact distribution, provide evidence for the presence of sub-surface features,
and lead to the recovery of an artifact assemblage that effectively represents the
occupation(s) present at a site. Whether shovel testing on a cruciform pattern satisfies
these conditions is the ultimate question here. This assessment will be made by
comparing the results of shovel testing arrived at through employing three different
sampling strategies at the same sites (Table 1).

Table 1. Description of three sampling strategies employed in study.


Stage I
Shovel Tests (cruciform)

Stage II
Shovel Tests (grid)

Stage III
Shovel Test Pits (50 x 50 cm)

Shovel test size: 30 cm diameter

Shovel test size: 30 cm diameter

Shovel test size: 50 x 50 cm

Shovel test spacing: 10 m

Shovel test spacing: 10 m

Shovel test spacing: 10 m

Shovel test transects:


-Two transects. One
established in a north/south
direction and the other in an
east/west direction
-Shovel tests excavated along
each transect until two
consecutive negative shovel
tests

Shovel test transects:


-Transects established on a
grid pattern with 10 m
spacing between each
transect and each shovel test
-Shovel tests excavated along
each transect until two
consecutive negative shovel
tests

Shovel test transects:


-Transects established on a
grid pattern with 10 m
spacing between each
transect and each shovel test
-Shovel tests excavated along
each transect until two
consecutive negative shovel
tests

Methods
The Stage I strategy employs shovel testing on a cruciform pattern with the
original positive shovel test serving as the point from which all subsequent tests are
established. From this point of origin, subsequent shovel tests are dug with 10 m spacing
in north, south, east, and west directions until two consecutive negative tests are dug in
each direction. The resulting distribution of positive shovel tests establishes the sites
spatial dimensions, while the recovered artifacts are used to determine the characteristics
of the sites occupation(s). Shovel tests are 30 cm diameter and are dug until the clay
subsoil is encountered.
The Stage II strategy employs shovel testing on a grid pattern with 10 m spacing
between all shovel tests and transects. As with Stage I, all shovel tests are 30 cm in
diameter and dug until clay subsoil is encountered. This stage differs from Stage I in that
shovel tests are dug in the cardinal directions of all positive shovel tests, rather than just
the original shovel test. As with Stage I, all transects are continued until two consecutive
negative tests are dug in all cardinal directions.
Like Stage II, Stage III employs shovel testing on a 10 m grid; however, shovel
test pits 50 x 50 cm in size are dug instead of 30 cm wide shovel tests. During Stage III
shovel test pits are placed at the locations of all positive and negative shovel tests
established during Stage I and II. Thus, the excavation of STPs involves further
excavating the area surrounding the 30 cm shovel tests until a 50 x 50 cm area has been
excavated. This also means that in order to tabulate the numbers of artifacts recovered in
the shovel test pits during Stage III, artifact numbers recovered during Stage I and II are
included in this count.
Results
The three-stage sampling strategy outlined above was employed at three different
prehistoric sites from north-central Mississippi. These include the Peacock 1 site in
4

Choctaw County and the Landrum 2 and Landrum 13 sites in Webster County (Figure 1).
The data below provide a means for comparing the results of the three strategies and their
respective effectiveness at determining the size of occupations and their artifactual
content.
Peacock 1 (22CH522)
Results from Peacock 1 show that advancing through the three stages of sampling
led to an increase in both site size and numbers of artifacts (Figures 2 and 5; Table 2).
Additionally, assemblage diversity also increased as considerably more kinds of artifacts
were recovered during each stage (Table 3). An equally important aspect is the question
of to what extent different sampling strategies allow for the detection of sub-surface
features? During testing at Peacock 1, a midden-filled pit feature (Figure 2) was
encountered during Stage II when one of the additional shovel tests (20S10W)
encountered an area of midden soil with considerably higher artifact density. While the
feature was encountered during Stage II it is certainly true that greater confidence in the
nature of what had been encountered was better accomplished during Stage III
investigations when the larger 50 x 50 cm test pit was excavated providing better
exposure of the feature. However, I believe it is fair to say that most researchers would
have recognized that they had encountered a sub-surface feature during Stage II
investigations.
Landrum 2 (22WE511)
Investigations at the Landrum 2 site provided similar results to those from
Peacock 1 in that clear evidence exists for important increases in the numbers and kinds
of artifacts encountered during each stage, as well as increase in site size (Figure 4 and
Table 4). However, site size did not increase from Stage II to III as no artifacts were
recovered when expanding the negative tests from Stage II into 50 x 50 cm test pits.
Diversity in artifact types increased even more during Landrum 2 investigations
providing convincing evidence for the inadequacy of shovel testing on a cruciform
pattern (Table 5). Similar to investigations at Peacock 1, a sub-surface feature was
encountered during Stage II testing. The feature was represented by a stone-tool cache
including a heavily-worked Benton point, a sandstone pestle, and multiple pieces of
grinding stones. As with Peacock 1, greater confidence in the nature of the feature was
accomplished during Stage III when the 50 x 50 cm unit provided better exposure.
During Stage II, the shovel test at 10S10W exposed the sandstone pestle and two pieces
of grinding stones. When the feature was further excavated during Stage III, the Benton
point and additional pieces of grinding stones were exposed. However, similar to
Peacock 1, it was also expected during Stage II that the cluster of artifacts exposed in the
shovel test represented a tool cache.
Landrum 13
Investigations at the Landrum 13 site provided less dramatic results than those
from Peacock 1 and Landrum 2. While there were increases in the numbers and kinds of
artifacts encountered, these increases were minimal (Tables 6 and 7). The kinds of
5

Table 2. Artifact tabulations by sampling stage from the Peacock 1 site (22CH522).
Stage I
Unit
0N0E
0N10W
10S0E
20S0E
30S0E

Artifacts
Pottery
Pottery
Pottery
Pottery
Chert debitage
Total

Stage II
Count
1
1
2
2
1
7

Unit
0N0E
0N10W
10S0E
20S0E
30S0E
20S10E
20S10W

10S10E
10S10W
10S20W
10S30W
10S20E
10S30E

Artifacts
Pottery
Pottery
Pottery
Pottery
Chert debitage
Pottery
Pottery
Chert debitage
Mod. sandstone
Fired clay
Mod. sandstone
Pottery
Mod. sandstone
Mod. sandstone
Chert debitage
Pottery
Pottery
Mod. sandstone
Total

Stage III
Count
1
1
2
2
1
1
11
3
1
12
1
5
2
1
1
1
1
1

Unit
0N0E
0N10W
10S0E

20S0E
30S0E
20S10E
20S10W

10S10E

48
10S10W
10S20W
10S30W
10S20E
10S30E
10S40E
0N20W
10N0E
20S20W
30S30E

Artifacts
Pottery
Mod. sandstone
Pottery
Chert debitage
Mod. sandstone
TQ debitage
Mussel shell
Pottery
Mod. sandstone
Pottery
Chert debitage
Mod. sandstone
Fired clay
Animal bone
Charred wood
Pottery
Chert debitage
KQ debitage
TQ debitage
Pottery
Mod. sandstone
Pottery
Mod. sandstone
TQ debitage
Mod. sandstone
Pottery
TQ debitage
Pottery
Mod. sandstone
Fired clay
Mod. sandstone
TQ debitage
Total

Count
0
4
1
2
1
1
1
1
0
4
2
98
3
6
11
1
8
9
2
2
1
2
1
0
0
0
1
3
1
1
2
1
1
1
1
3
1
177

Table 3. Variation in artifact types recovered during three-stage sampling at Peacock 1.


Stage I

Stage II

Artifact Types
Pottery
Chert debitage

Artifact Types
Pottery
Chert debitage
Mod. sandstone
Fired clay

Total: 2

Total: 4

Stage III
Artifact Types
Pottery
Chert debitage
Mod. sandstone
Fired clay
Kosciusko quartzite debitage
Tallahatta quartzite debitage
Animal bone
Charred wood
Mussel shell
Total: 9

Table 4. Artifact tabulations by sampling stage from the Landrum 2 site (22WE511).
Stage I

Stage II

Stage III

Unit
0N0E

Artifacts
Chert debitage
Mod. sandstone

Count
3
1

Unit
0N0E

Artifacts
Chert debitage
Mod. sandstone

Count
3
1

0N10E

Chert debitage
Mod. sandstone
Chert debitage
Mod. sandstone
Chert debitage
Mod. sandstone
Chert debitage
TQ debitage

3
2
1
2
1
7
2
1

0N10E

Total

23

Chert debitage
Mod. sandstone
Chert debitage
Mod. sandstone
Chert debitage
Mod. sandstone
Chert debitage
TQ debitage
Mod. sandstone
Sandstone
pestle
Grinding stone
Mod. sandstone
Pottery
Chert debitage
Mod. sandstone
Chert debitage
Mod. sandstone

3
2
1
2
1
7
2
1
9
1
2
4
1
1
2
2
2

10N10W

20N10W

Chert debitage

20N10W

10S10E
20S10E
10S30W
40S0E

Mod. sandstone
TQ debitage
Chert debitage
Mod. sandstone
Mod. sandstone

1
1
1
2
2

Total

56

0N10W
0N20W
10N0E
10S0E

0N10W
0N20W
10N0E
10S0E
10S10W

10N10W
10S20W
10N20W
30S0E

Unit
0N0E

0N10E

0N10W

10S0E

0N20W

10S10E
10S20W

30S0E

40S0E

10S10W

10N0E
20S10E

10S30W
10N20W

Artifacts
Chert debitage
Quartzite
hammerstone
Mod. sandstone
Triangular point
Biface fragment
Chert debitage
Mod. sandstone
Chert debitage
TQ debitage
Mod. sandstone
Chert debitage
Mod. sandstone

Count
1
1

Chert debitage
TQ debitage
Mod. sandstone
Chert debitage
Mod. sandstone
Mod. sandstone
Stemmed
projectile point
Ft. Payne chert
debitage
Mod. sandstone
Chert debitage
TQ debitage
Mod. sandstone
Pottery
Chert debitage
Mod. sandstone
Benton point
Chert debitage
TQ debitage
Quartz debitage
Mod. sandstone
Chert debitage
Mod. sandstone
Stemmed
projectile point
Pottery
Chert debitage
TQ debitage
Fired clay
Mod. sandstone
-

5
1
8
3
3
7
1

10
7
1
2
13
0
0

Total

197

8
1
1
20
3
4
1
13
1
3

1
3
2
2
3
1
2
1
1
18
7
1
12
4
21
1

Table 5. Variation in artifact types recovered during three-stage sampling at Landrum 2.


Stage I

Stage II

Artifact Types
Tallahatta quartzite debitage
Chert debitage
Mod. sandstone

Stage III

Artifact Types
Tallahatta quartzite debitage
Chert debitage
Mod. sandstone
Pottery

Total: 3

Total: 4

Artifact Types
Tallahatta quartzite debitage
Chert debitage
Mod. sandstone
Pottery
Fired clay
Madison projectile point
Benton projectile point
Stemmed projectile point
Sandstone pestle
Sandstone grinding stone
Ft. Payne chert debitage
Quartzite hammerstone
Quartz debitage
Biface fragment (chert)
Total: 14

Table 6. Artifact tabulations by sampling stage from the Landrum 13 site.


Stage I
Unit
0N0E
10S0E
0N10W

Artifacts
Pottery
Pottery
Pottery
Mod. sandstone

Stage II
Count
3
2
1
2

Unit
0N0E
10S0E
0N10W
10S10W

Total

Artifacts
Pottery
Pottery
Pottery
Mod. sandstone
Pottery

Stage III
Count
3
2
1
2
1

8
Total

Unit
0N0E
10S0E
0N10W
10S10W
20S10W
30S10W

9
40S10W
50S10W
70S10W

Artifacts
Pottery
Pottery
Pottery
Pottery
Mod. sandstone
Chert debitage
Pottery
Pottery
Total

Count
0
2
3
0
3
1
1
1
1
1
13

Table 7. Variation in artifact types recovered during three-stage sampling at Landrum 13.
Stage I

Stage II

Stage III

Artifact Types
Pottery
Mod. sandstone

Artifact Types
Pottery
Mod. sandstone

Artifact Types
Pottery
Mod. sandstone
Chert debitage

Total: 2

Total: 2

Total: 3

artifacts recovered during both Stage I and II included only pottery and modified
sandstone. During Stage III only one additional type of artifact was encountered, chert
debitage. At no point during the sampling process were subsurface features encountered.
While little changed in the numbers and kinds of artifacts encountered, advancing
through the sampling stages did lead to a substantial increase in the sites size (Figure 5).
Shovel testing along the eastern edge of the site revealed considerable ground
disturbance in this area likely associated with past timber-cutting activities. This
situation undoubtedly affected the results of the study and probably makes Landrum 13 a
less than ideal study site for my purposes. However, sites that have been impacted in this
manner are frequently encountered during cultural resource survey. This fact led me to
include the results from Landrum 13 in this study in order to allow for assessments to be
made of how sampling might affect these kinds of sites. It is also likely, however, that
Landrum 13 simply demonstrates the point that sample size is far less important when
sampling populations of low diversity. As a result of field conditions, including
disturbance to the east and the terrace landform dropping off to the west into low-lying,
wet conditions, shovel tests were not continued in these directions beyond what is shown
in Figure 5.
Summary and Conclusions
In summary, I agree with Box and Draper. All models are wrong, but some are
useful. This study provides strong evidence that site investigation and delineation by
shovel testing on a cruciform pattern leads to the creation of occupational models that
suffer from undersampling and that are simply not useful for serving their intended
purposes. I readily acknowledge that the principle that larger samples more accurately
reflect the population of interest is a truism that does not need empirical demonstration.
However, the results presented here clearly show that the problem of undersampling
leads to inaccuracy in the critical measures of artifact and feature diversity, which are
central to the evaluation of a sites eligibility for inclusion on the National Register.
Consider the results of the three examples I have presented. In each case, the results from
Stage I sampling would have led most researchers to dismiss the sites as insignificant,
and absent of diagnostic artifacts or sub-surface features. However, by the time Stage III
sampling was completed, the models of archaeological occupations of these sites had
changed dramatically and, at least in my opinion, two of the three sites are
unquestionably worthy of a recommendation as potentially eligible, if not eligible, for
inclusion on the National Register. How many supposed lithic scatters written off by
archaeologists were nothing more than products of undersampling?
The occupational models that result from surveys based on shovel testing on
cruciform are simply not useful in serving the purposes for which they are built, and I
encourage the Mississippi SHPO to strike this language from their guidelines and insist
that shovel testing on a grid pattern be employed for site delineation during Phase I
survey. When considering the effects of the various sampling stages on the measure of
artifact type diversity, we should be asking the question of how far are we from reaching
the point of redundancy? How much larger would our sample need to be to arrive at the
asymptote where increased sampling provides nothing but redundant information? While
it is also clear that the excavation of 50 x 50 cm shovel test pits on a 10 m grid leads to
even stronger occupational models, I am sympathetic to the argument that it would be
9

difficult to justify requiring such a strategy at the Phase I level due to the time-consuming
nature of this approach. Thus, removing shovel testing on a cruciform pattern in favor of
shovel testing on a grid represents a sensible and defensible position.

Figure 1. Map showing the locations of the study sites within the upper Big Black River
drainage (tan) of north-central Mississippi.

10

Figure 2. Midden-filled pit feature (Feature 1) from the Peacock 1 site.

11

Figure 3. Shovel test maps showing results of three-stage sampling at the Peacock 1 site.

12

Figure 4. Shovel test maps showing results of three-stage sampling at the Landrum 2
site.
13

Figure 5. Shovel test maps showing results of three-stage sampling at the Landrum 13
site.

14

Figure 6. Chart showing how site sizes change by sampling stage.

Figure 7. Chart showing how artifact counts change by sampling stage.

Figure 8. Chart showing changes in artifact type diversity by sampling stage.


15

References Cited
Box, George E.P., and Norman R. Draper
1987 Empirical Model-Building and Response Surfaces. John Wiley & Sons.
Dunnell, Robert C.
1971 Systematics in Prehistory. Free Press, New York.
1992

The Notion Site. In Space, Time, and Archaeological Landscapes, edited


by Jacqueline Rossignol and LuAnn Wandsnider, pp. 21-41. Plenum
Press, New York.

16

You might also like