You are on page 1of 14

Foundations In Biology: Ecology & Evolution LAB

Biology 206

LAB 1

LABORATORY 1: Measuring and describing phenotypic variation


Objectives of this laboratory:
1.
2.
3.
4.
5.

To examine biological objects and to make observations of their characteristics,


specifying traits or characters by which they vary.
To measure variation in some traits, and to appreciate the distinctions between traits that
are discrete (unitary) and those that are continuous.
To discuss why and how to avoid bias in sampling.
To use descriptive statistics and a histogram to describe variation in traits.
To understand the basic concepts behind statistical hypothesis testing.

Introduction:
Variation is a central fact of biology. Organisms differ from one another. This variability may
represent the differences among organisms of different species, among populations within the
same species, or among individuals within a population. The variation present within a
population of organisms is an essential component of evolutionary change. When traits vary
among individuals, some individuals may survive and reproduce more successfully than others in
their current environment. The process through which certain inherited traits become more or
less common in a population because these traits are associated with reproductive success (or
fitness) is called natural selection. Analyzing variability is one way of appreciating and
understanding evolutionary and ecological processes.
Today, we will look at a variety of different organisms and attempt to identify traits, or
characters, in which they differ. While it is relatively easy to look at three different people and
say that they look different, it is more difficult to specify the exact ways in which they differ. At
first, we identify the more obvious traits: hair color, eye color, height, and so forth. But how do
we quantify or specify more subtle differences -- things like the shape of the face, body
proportions, posture, the form of the fingernails, and so forth? To measure variation, we have to
identify discrete properties, or traits, which we can evaluate in some quantitative way. This is
the basic principle behind today's lab.
Of course, there is a second level to our analysis of variation which is also important for
biological understanding. Measuring variation is the starting point. The second step is to
identify the source of the variation we see in nature. Variation can arise from two principle
sources. Much biological variation, especially among individuals of the same species, is
ENVIRONMENTAL in origin. That is, the traits vary because of differences in environmental
influences (light, water regime, nutrition, age, or similar factors) which affect the expression of
the trait. Tomato plants with more nutrients will be taller, or greener, or less spindly than those
growing in poorer soils. Baby birds within the same nest may differ in weight at fledging
because of differences in egg size, or because of differences in their ability to coerce their
parents to give them food. Flamingos that eat a regular diet of foods with carotenoid pigments
will be redder in color. Daphnia (aquatic invertebrates) at the end of the summer may be smaller
in size than those produced in early summer. All of these sources of environmental variation
are important indicators of ecological conditions. Observing and measuring variation in
1

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

environmental conditions are thus tasks to which ecologists devote considerable time and effort.
The second main source of variation is GENETIC in origin. Genetic variation results from
differences among organisms at the level of their DNA, or their basic genetic material.
Differences in nucleotide sequences (the pattern of the DNA) may result in chemical products
which differ in their form or function. These different products may affect the organism's
morphology, physiology or behavior. In this case, the genetic differences between different
organisms can be passed on to their offspring. Genetic differences are heritable and are
therefore subject to natural selection. Some traits are affected by both environmental and
genetic factors. That is, genetic variation may exist for a trait (such as leaf size), but the same
trait is also influenced by environmental factors (such as light intensity). In this case, we say that
the trait exhibits a genotype x (by) environment interaction. Different beech trees may vary in
their basic leaf size, but within the same tree, leaves in the canopy may be smaller than leaves in
the shady tree interior. Today we will consider some basic questions about variation and its role
in diversity and evolution.
Exercise 1. Variation in traits in biological materials.
Different traits or characters can be quantified in different ways. Some traits are quantified by
counting them. For such traits, the entity under observation is either present or not present, and
in some countable number. These are discrete traits, like the number of wings on an insect.
There may be 4 wings, 2 wings, or no wings, but there are never (in an undamaged insect) 3.5
wings. Similarly, a fruit fly may have 26 bristles or 27 bristles on its thorax, but not 26.73
bristles. (However, the AVERAGE number of bristles could be 26.73; an average is a statistical
quantity, not a measurement from the real world).
Other traits are measured on a continuous scale. Values for the trait in this case can take
fractional measurements. So the height of a penguin could be 48.7 cm, or 55.2 cm, or any value
in between. The number of decimal places to which this trait is measured will depend in part on
the precision of the tool with which you make the measurement.
Although discrete and continuous traits are in some ways different, they share the common
attributes that 1) they can take a wide variety of values, 2) a value of ZERO means something
real, and 3) there is a ranking in values, such that a trait with a value of 2.0 is really and
conceptually LESS THAN a trait with a value of 3.0.
There are other kinds of variables, however, that have different qualities. Ordinal variables are
variables that are ranked, but not measured in a precise way. So for Goldilocks porridge bowls,
there was one that was too cold, one that was too hot, and one that was just right. The difference
between too cold and just right is not necessarily the same in magnitude as the difference
between too hot and just right but there is a ranking implicit in these variables. Other examples
of ordinal variables are letter grades. A B+ is a ranked value; it is less than an A-, but not
necessarily by a specific amount. What might be the reasoning for employing ordinal variables
like these? What are some other examples of ordinal variables?

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

Finally, there are variables that we can identify, but which have no value or rank associated
with them at all. These categorical or nominal variables include things like sex (male or female
or, in some cases, hermaphrodite), color (pink, green, and blue), presence or absence of wings,
kind of tree a nest is found in (oak, beech, or maple), and so forth. There is no sense that maples
are higher than beeches; only that they are different.
Notice that, for categorical variables, we cannot really compute an average. You cannot add
up colors and get an average color. And the same is true of ordinal variables; you can say there
were more bowls with cold porridge, but you cannot find an average (or a mean) value of
porridge type (since we categorized bowls rather than measured the temperature of each bowl).
But with continuous variables, whether discrete or truly continuous, we CAN find an average
value, and we can calculate various measures about that average. So variables differ in their
information content, and in the kinds of analyses they permit.
Procedure:
For this part of the lab, work in groups of two or three. Walk around the lab, from station to
station, and take a look at the biological materials at each stop. Examine the objects at each
station carefully. Do you see differences among individuals, or traits in which they seem to
vary? What traits or characters could you measure or score to assess variation in these different
materials? On the other hand, some traits do NOT vary between individuals. What are some
invariant traits in these various biological materials? Complete the table on page 8 of this
handout.
Exercise 2: Describing variation in conifer needles
As an example of variation and the way we can measure it in natural materials, we will look
today at variation in pine needles. Each needle is a plant organ, developing according to a
heritable program (genetics) influenced by local conditions (environment). A pine needle
performs the critically important job of photosynthesis, producing chemical energy for the tree.
A needles length may affect how well it functions. If the needle is too short, it may not have
enough photosynthetic tissue to produce sufficient food for the tree. On the other hand, needles
that are too long may fail to transport fluids adequately to the tip, or might accumulate too much
ice and break limbs in the winter. Limits on size and shape affecting the performance of a
biological trait are called functional constraints, and they help to explain why many species
characteristics remain within predictable ranges.
Although needles on the same tree might be expected to conform to a genetically determined
size, differences in the particular conditions experienced by any particular needle could influence
needle length. Among a population of trees, the variation in needle length could be due to
different trees having different genes as well as experiencing a range of environmental
conditions.
Measurement allows us to consider variation quantitatively. We can use measurement and
mathematics to characterize the variation present in an individual or population and to find
differences among individuals and groups.
3

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

Consider the population of students on campus. Since the populations members vary in age,
physical condition, or genetic characteristics, we have to observe many individuals before we
can say much about the population as a group. It may be unreasonable to observe every single
student; in this case, we can use a sample of individuals to represent the whole. This poses an
interesting challenge: how many individuals must we observe to ensure that we have adequately
addressed the variation that exists in the entire population? How can this sample be collected as
a fair representation if the whole? Researchers try to avoid bias, or sampling flaws that overrepresent individuals of one type and under-represent others. If we observed students in a
hallway with a womens restroom, for example, we might draw a biased conclusion about the
gender ratio on campus overall.
After collecting our sample, we can measure each individual and then use these measurements to
develop an idea about the population. If we are interested in how tall students are, we could
measure the height of each student in our sample. Reporting every single measurement in a data
table would be the most descriptive, but not very useful, because the human mind cannot easily
take in long lists of numbers. A more fruitful approach is to take all the measurements of
students and systematically construct a composite numerical description, or statistic, which
conveys information about the population in a more concise form. The average (also called the
mean) height is a familiar way to represent the size of a typical individual. The mean of any
sample is calculated as:

= =

the sample size.

where x represents each measurement and n is

We might find, for example, that the mean height of students on campus is 1.70 m (about 5 feet
7 inches), based on a sample of 80 students. The symbol is used for the (unknown) mean of
all the students on campus, which we are trying to estimate in our study. The symbol is used
for the mean of our sample, which we hope to be close to .

Picturing variation
After calculating statistics to
represent the typical individual, it is
still necessary to consider variation
among members of the population.
A histogram is a simple graphic

Frequency

Means are useful, but they must be interpreted properly. A mean evokes the concept of the
typical student, even though there may be no student who is exactly 1.70 m tall in the
population. For this reason, it is often helpful to use more than one statistic in our description of
the typical member of a population. One useful alternative is the median, which is the
individual ranked at the 50th
percentile when all data are arranged
20
in numerical order. Another is the
mode, which is most commonly
15
observed height of all students in the
sample.
10
5
0
1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2

Height (m)

Figure 1. Sample histogram of student heights.


4

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

representation of the way individuals in the population vary. To make a histogram, the range of
measurements in a sample is divided into equal classes; then the number of measurements that
fall into each class is counted. The number of individuals falling in each class is the frequency
of that class in the population. Figure 1 shows a sample histogram of student heights; the height
of each bar represents the proportion of the population falling into each height class.
Describing a pattern of variation
Notice that in our sample histogram, the most common height classes are near the middle of the
distribution (Figure 1). Extremely tall and extremely short students are rare, while intermediate
heights are more common. The skyline of the histogram fits under a bell-shaped curve that is
symmetrical, and has characteristics shoulders and tails that taper to the ends of the range in
a predictable way. Statisticians call this shape a normal distribution.
This pattern of variation is common in nature, and is encountered quite often when one effect is
influenced by many independently acting causes. Since height in humans is influenced by
gender, age, genes, and the environment, it would not be surprising to find that heights of
students on campus are normally distributed. Because the normal distribution is encountered so
commonly, many of the statistical tools ecologists use to test hypotheses assume that variations
in their data are distributed in this bell-shaped form. Models and tests based on this kind of
distribution are called parametric statistics. If the histogram of variation is lopsided, has more
than one peak, or is too broad or too narrow, then parametric tests should not be used. Nonparametric tests have been developed for these kinds of data. Because the nature of variation in
your measurements is critical to further analysis, it is always a good idea to make a histogram
and compare your data to a normal distribution before taking your analysis any farther.
How reliable is the mean that we calculate from our sample of students? First, the sample size is
critical. A sample calculated from a small sample of 10 students might be significantly off the
mark. By chance, the 10 students you measure might include a higher proportion of very tall
students, for example, than is found in the population overall. If the sample is expanded to 1000,
it is much more likely that your calculated mean will accurately reflect the population average.
A fundamental principle of data collection is that the sample size must be large enough to
eliminate sampling errors due to chance departures from the population mean. (Youll notice
that throughout the semester, we will pool data from the section or from all sections for this
reason.)
How large, then, must a sample be? This depends on the amount of variation in the population.
Samples of trees taken from a tree farm where all the trees are nearly the same size will give
reliable estimates, even if the sample is small. In a natural population with a great range of sizes,
the sample has to be expanded to ensure that the larger variation is accounted for. Thus, the
more variable the population, the larger the sample must be to achieve the same level of
reliability. It becomes obvious that we need a statistic to measure variation.
To measure the amount of variation around the mean, we use a statistic called the standard
deviation (s). The standard deviation is expressed in the same units as the original
measurements, which would be meters in our hypothetical student height study. A standard
deviation can thus be shown as a portion of the range of the measurements. In normally
5

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

distributed populations, 95% of all individuals fall within 1.96 (or roughly 2) standard deviations
from the mean.
To calculate the size of a standard deviation, is is actually easier first to calculate a related
statistic called the variance. The variance is the square of the standard deviation, so we use s2 =
the sample variance. Calculation of the variance is based on the difference between each
observation and the mean. If all these differences are squared and we calculate an average of the
squared values, is gives us an idea of how much the measurements are spread out around that
mean value. So variance is calculated as = =
deviation is just the square root of the variance; in other words,

(!)

(!)
= .

. The standard

Its important to always report a sample mean along with a statistic representing its variation.
Without sharing with your readers the sampling methods, sample size, and the variability of the
population, there is no way for them to know how accurately the sample mean represents the
population. You should realize that it is also essential to consider the variation present within
samples whenever you want to compare two or more samples. For example, you are unlikely to
calculate exactly the same mean heights if you randomly sampled 80 students at NJIT and 80
students from Rutgers. Whether the difference between those means is biologically meaningful
or due to chance, however, will depend on the variation of measurements around those means.
In scientific research, comparisons among samples are typically made using a statistical test.
Statistical testing
Statistical tests are important scientific tools which allow us to interpret data by determining the
significance of differences between groups, or between what was expected and what was
observed, considering the number of observations and their variance. A difference is considered
statistically significant when it is found to be unlikely to be due to chance. For most basic
statistical tests, you should identify the null hypothesis (H0) that you are testing. The null
hypothesis is a sort of default explanation which states that nothing interesting has been
observed, for example: there is no difference between groups, or there is no relationship between
two variables. Statistical tests determine whether we can accept or reject the null hypothesis,
assuming a certain unlikelihood of the observations, usually 5% or 0.05 in biological statistics.
If the test provides a probability or p-value equal to or less than 0.05, we have determined that
our data would be observed if the null hypothesis were true (in other words, due to chance) less
than 5% of the time thats pretty unlikely! Therefore, when p 0.05, we reject that null
hypothesis and accept an alternative hypothesis (for example, that there is a difference between
groups). If p > 0.05, we cannot reject the null hypothesis. If p > 0.05, even if it seems that you
have found a difference or a relationship, that pattern is not more than might reasonably be
observed just by chance. Be careful with the words you use to interpret your results. Note that
we accept or reject a hypothesis; or, you might say that the data supported or did not support a
hypothesis. There is always the possibility that any result could be due to chance, so we cannot
prove or disprove a hypothesis.
In our lab today, we will be comparing the lengths of needles between two groups, so we will use
a parametric statistical test called the t-test. The t-test is used to determine whether there is a
6

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

significant difference in the mean value of some variable between two different groups. The null
hypothesis of a t-test is that there is no difference in the variable between the groups. The
calculations involved in a t-test consider the means, sample sizes, and standard deviations of
your samples to determine the probability that your data were drawn from populations that do
not differ from one another. Using this test will allow us to quantify our confidence in the
conclusions we make about whether our two samples differ from one another.
Procedure:
1. Locate several coniferous trees on campus. You will need to collect needles from at least two
different individuals. You can either make a comparison between two individuals of the same
species, or between two different species. Ideally, if you are comparing species, you should
collect needles from multiple individuals of each species. (Why?)
2. Develop a sampling plan. Your sample size will be 90 needles. If you can locate multiple
individuals of the same species, spread out your collection to include roughly equal numbers of
needles from each of the trees. Decide whether you will pull needles from a live branch, or pick
fallen needles from the ground. If you collect live needles from the tree, will you always collect
from a low branch, or will you try to collect equal numbers from high, mid-height, and low
branches? If youre collecting needles from pines (genus Pinus), you will discover that their
needles come in bunches. The brown collar of tissue holding the bunch of needles together is
actually a dwarf branch, called a fascicle. The number of needles in a bunch is fairly consistent
and is useful for identification. For instance, the Eastern White Pine (Pinus strobus) typically
has five needles per fascicle, while the Red Pine (Pinus resinosa) has two. Make a decision
about which needles in the bunch you will measure. The longest one? A randomly selected one?
What will you do if you encounter a broken needle? Whatever your method, it would be best to
measure only one needle per bunch. (Why?) Pull the needles apart carefully, so as not to
introduce error by breaking the base, and measure one needle according to your predetermined
sampling plan.
3. Divide up the data collection so that you will end up with 90 samples in each of the two
groups (species or individuals) you decide to compare. Measure each of the needles in your subsample and record the lengths in millimeters (mm) on the data sheet on page 9.
4. Follow the directions on the worksheet to produce a histogram of needle lengths, to calculate
some descriptive statistics for each group, and to perform a t-test to compare needle lengths
between your two groups.

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

LAB 1- WORKSHEET
Exercise 1. Variation in traits in biological materials. Write down at least 10 different traits
that you observe. You should have at least 1 trait from each specimen box (object) and at least 1
of each type of trait (see page 2-3).
Objects

Traits that vary

Type of variable

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

Exercise 2. Describing variation in conifer needles


Data Sheet group data
Lengths of needles of group A:




Lengths of needles of group B:

Length (mm)

Length (mm)

10

10

11

11

12

12

13

13

14

14

15

15

16

16

17

17

18

18

19

19

20

20

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

Histogram
Follow these steps to produce a histogram illustrating the variation present in a sample:
1.

Select ONE of the groups above for producing a histogram.


Which group are you using? ________________________
2. Identify the minimum and maximum lengths in the sample. The difference between them
is the range of lengths in your sample.
3.

4.
5.

Frequency

6.

MAXIMUM: ________ - MINIMUM: __________ = RANGE _____________


Divide the range in 10-20 equal segments, or size classes. For example, you could
divide data falling between 60 and 112 mm (range of 52 mm) into 13 segments of 4 mm.
The first size class would include needles from 60 63 mm, the next would be 64 - 67
mm, and so on.
Label the horizontal axis of the histogram below with your size classes. (One size class
per column; you may not need to use all the columns.)
Fill in one box in the column above each size class for each length in your sample that
falls into that size class. Do this for all 80 measurements in the sample youve chosen.
Youve produced a histogram of your data! Interpret the graph by answering the
following questions in bold type.

Length (mm)
10

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

1.

Do your data appear to follow a normal distribution? Describe any differences.

2.

Is it appropriate to use a t-test to test these data? Do you think a non-parametric


alternative would be a better choice?

3.

Do you think you made enough measurements to be able to confidently decide your
answers to question 2 above, or do you think more data may be needed? Explain.

4. How did you attempt to avoid bias during data collection and measurement?

Describing variation
Select any eight of your needle length measurements to use to calculate some descriptive
statistics by hand. Copy those eight measurements in the column below.
Length (mm)
5. Calculate the mean length of this
sample of eight needles. Show
your work!

6.

Calculate the variance of this sample of eight needles. Show your work!

11

Foundations In Biology: Ecology & Evolution LAB


Biology 206

7.

LAB 1

Calculate the standard deviation of this sample of eight needles. Show your work.

8. What do the variance and standard deviation tell you about your population that
the mean does not tell you? Why is it important to report some measure of
variation, along with the sample size, whenever you report a calculated mean?

Using EXCEL to analyze data


Now that you understand the basics of how descriptive statistics are calculated, well use the
program Excel to perform the same calculations for the entire data set. Spreadsheet programs
such as Excel are very useful for organizing and analyzing scientific data. Ask your TA if you
have any questions about using this program.
Data file setup: Open a new worksheet in Excel. You should set up an Excel file that looks
something like the data sheet on page 2, except you should enter the entire class data for each
group in a continuous column (so you should end up with two columns of data, one for each
group). Note that its not necessary to record the sample numbers, since row numbers are always
included in an Excel worksheet. Transfer your data to the excel sheet.
Descriptive statistics
Excel can be used to calculate the descriptive statistics for your data. To use the built-in
functions in excel, first choose an empty cell and type an equal sign (=). To calculate the mean
of your data, type AVERAGE() after the equal sign. Put your cursor between the parentheses and
select the data points that you would like to average; excel will insert these cells into your
function (e.g., =AVERAGE(A2:A91) ). Press enter, and the average of the data you selected will
appear in the cell. Compute the average (and the following statistics) separately for each group of
data. To calculate the standard deviation, type STDEV() after the equal sign and follow the same
steps as above. To calculate the variance, type VAR() after the equal sign and follow the same
steps above. Your TA can guide you through this process if you have any questions.
Group
Mean ()
Sample size (n)
Standard deviation (s)
Variance (s2)
12

Foundations In Biology: Ecology & Evolution LAB


Biology 206
9.

LAB 1

How do the mean, standard deviation, and variance of the group you used for your
calculation by hand differ from those you calculated using only 8 measurements?
What does this tell you about the importance of sample size?

10. Which group had more variation in needle length within the group?

Statistical testing: t-TEST


We can use Excel to perform a t-test to compare needle length between our two groups. Again,
use an equal sign in a blank cell to start a new function. To perform a t-test, type TTEST()
after the equal sign (note that there are 2 ts). There are four entries for this function, and each
should be separated by a comma: array1, array2, tails, type. For array 1, select the data from the
first group. For array 2, select the data from the second group. We will be performing a twotailed test, so for tails, enter the number 2. For type, enter 2. This corresponds to the situation
where the samples are from populations with the same variance, and we have no reason to think
in advance that one trees needles should vary in size more than anothers. The final function
should look like this: =TTEST(A2:A91,B2:B91,2,2). Press enter, and your p-value will appear in
the cell. NOTE: It is OK if your two groups have different sample sizes; you dont need to
replace any missing values with 0.
11. State the null hypothesis of this test:
12. State the alternative hypothesis:

13. Results:

P=

14. Based on this p-value, what conclusion can you draw about needle length in your
two groups?

13

Foundations In Biology: Ecology & Evolution LAB


Biology 206

LAB 1

15. Variation among members of a population can lead to natural selection, but only if
two conditions are met: First, the trait must be relevant to an individuals survival
and/or reproductive rate. Second, variation in this trait must be heritable, that is, at
least partly controlled by genes.
a. How might you design an experiment to determine the importance of needle
length in determining survival and reproduction?

b. How might you test the extent to which needle length is heritable?

14

You might also like