You are on page 1of 72

FINE SCALE GENETIC STRUCTURE DRIVEN BY HABITAT-DEPENDENT SELECTION IN A MESOCARNIVORE

BY ROBERT C. LONSINGER, B.S.

A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree of Master of Science

Major Subject: Wildlife Science Minor Subject: Experimental Statistics

New Mexico State University Las Cruces, New Mexico May 2010

Fine scale genetic structure driven by habitat-dependent selection in a mesocarnivore, a thesis prepared by Robert Lonsinger in partial fulfillment of the requirements for the degree, Master of Science, has been approved and accepted by the following:

Linda Lacey Dean of the Graduate School

Gary W. Roemer Chair of the Examining Committee

Date

Committee in Charge: Dr. Gary W. Roemer Dr. William Gould Dr. Caitriana Steele

ii

ACKNOWLEDGMENTS

I owe many thanks to my graduate advisor and friend, Dr. Gary W. Roemer, whose impact on me has been unsurpassed. His passion for ecology and teaching is contagious. He has provided me with support and guidance, from which I have grown into a better person both personally and professionally. His guidance and friendship has been irreplaceable. I thank the many friends and colleagues who provided invaluable assistance and guidance. Aaron Bueno Cabrera, James Doyle, Aaron Facka, Martin Moses, Missy Powell, James Ward and Bradford Westrich each assisted in the field. Fred Armstrong, Hildy Rieser and Renee West assisted with securing funding and logistical planning. Jack Kincaid and his mules were imperative to our backcountry stints. Funding was provided by the National Park Service and T&E, Inc. Assistantship support was provided by the Department of Fish, Wildlife and Conservation Ecology. Dr. Caiti Steele provided guidance with GIS modeling. Drs. David Daniel and William Gould provided guidance in the statistical analyses. Drs. Roemer, Gould and Steele reviewed and consequently greatly improved this thesis. I would like to thank my wife, Desiree Lonsinger, who endured many nights alone as I chased my ringtail quarry, for her unconditional support, both emotionally and financially and her continued encouragement throughout. My parents instilled in me a love for wild places, for which I am truly grateful.
iii

VITA

1979 2002 2003-2004

Born in West Chester, Pennsylvania B.S. Biology (Magna cum Laude) Gannon University, Erie, Pennsylvania Employed seasonally: Telemetry Assistant, USFWS Red Wolf Recovery Field Assistant, Nez Perce Tribe Gray Wolf Recovery Wildlife Technician, Turner Endangered Species Fund Wildlife Assistant, Arizona Game and Fish Department Black-footed Ferret Reintroduction Project Graduate Assistant, Department of Fish, Wildlife and Conservation Ecology, New Mexico State University Sigma Xi The Wildlife Society American Society of Mammalogists

2004-2006

2006-2010

Professional Societies

Technical Publications

Facka AN, Lonsinger RC, Roemer GW (2008) Estimates of population size of Gunnisons prairie dogs in the Aubrey Valley, Arizona based on a new monitoring approach. Final report to the Arizona Game and Fish Department. 26pp. King C, Broecher J, Siniawski A, Lonsinger RC, Pebworth J, Van Pelt WE (2005) Results of the 2004 Black-footed Ferret Release Effort in Aubrey Valley, Arizona. Arizona Game and Fish Department, Nongame and Endangered Wildlife Program Technical Report. 20pp.

iv

ABSTRACT

FINE SCALE GENETIC STRUCTURE DRIVEN BY HABITAT-DEPENDENT SELECTION IN A MESOCARNIVORE By Robert C. Lonsinger

Master of Science New Mexico State University Las Cruces, New Mexico, 2010 Dr. Gary W. Roemer, Chair

Habitat preferences and prey specializations influence interspecific partitioning and the distribution of species. Heterogeneity among conspecifics and the affinity of individuals to settle in habitats similar to where they were born may, in the absence of physical barriers to dispersal, influence the genetic structure of populations. We aimed to evaluate levels of population genetic structuring in a mesocarnivore, the ringtail (Bassariscus astutus), and hypothesized that fine-scale genetic structure could occur in this species and may be related to habitat-dependent selection that would result in genetically identifiable clusters. We used 15
v

microsatellite loci and two programs, STRUCTURE and GENELAND, to assess levels of population genetic structure. Our findings reveal complex hierarchical population genetic structure in absence of physical barriers to dispersal; STRUCTURE and GENELAND identified two and six subpopulations, respectively. Discriminant function analyses were then used to test for differences in habitat among clusters identified a priori by GENELAND. All the DAs proved to be robust, assigning a significantly high proportion (>80%) of individuals to their observed genetic cluster, indicating discriminant power that cannot be explained by random chance alone. Finally, using the ringtail as a short-range dispersal generalist we evaluated the degree of connectivity between two protected areas, Guadalupe Mountains National Park and Carlsbad Caverns National Park. Observed levels of population genetic structure could be differentiated with confidence based exclusively on habitat and landscape characteristics suggesting that this structure is driven by habitat-dependent selection during dispersal and settlement, despite a high degree of connectivity across the study region.

vi

TABLE OF CONTENTS

LIST OF TABLES ................................................................................ LIST OF FIGURES .............................................................................. ABBREVIATIONS .............................................................................. INTRODUCTION ................................................................................ METHODS ........................................................................................... Study Area ................................................................................ Genetic Sampling ...................................................................... Landscape and Habitat Sampling.............................................. Genetic Analysis ....................................................................... Standard Genetic Measures....................................................... STRUCTURE Analysis ............................................................ GENELAND Analysis .............................................................. Assessment of Habitat-Dependent Genetic Structure ............... RESULTS ............................................................................................. Trapping and Habitat Sampling ................................................ Genetic Sampling and Standard Genetic Measures .................. Bayesian Clustering Analyses................................................... Discriminant Analysis of Habitat-Dependent Genetic Structure ....................................................................................

ix x xi 1 4 4 6 7 8 9 9 11 12 16 16 16 19

24

vii

DISCUSSION ....................................................................................... Ringtails as a Model for Assessing Fine Scale Genetic Structure .................................................................................... Discriminant Analysis of Habitat-Specific Clustering ............. REFERENCES ..................................................................................... APPENDIX A: R Programming Language Code for Discriminant Analyses, Testing for Violations of Model Assumptions and Randomization Tests .............................................................................

31

34 37 40

46

viii

LIST OF TABLES Table 1. Range, Median, Mean and Standard Deviation of Habitat and Landscape Variables .................................................... 2. Standard Genetic Measures and Tests for HardyWeinberg Equilibrium Across 15 loci ................................ 3. Mean Number of Alleles Per Locus, Observed and Expected Heterozygosity, Fixation Indices and Tests of Heterozygote Deficiency for Clusters Identified by GENELAND ....................................................................... 4. Pairwise FST Matrix for Clusters Identified by GENELAND ....................................................................... 5. Eigenvalues, Proportion of Variation Explained, Wilks and APER for Two Linear Discriminant Analyses............. 6. Scaling Coefficients of Habitat and Landscape Variables for Two Linear Discriminant Analyses............................... Page

17

18

22

22

24

27

ix

LIST OF FIGURES Figure 1. Study Region and Ringtails Trapping Locations ................ 2. Representation of STRUCTURE Results ........................... 3. Proportion of Individual Ancestry in Each Cluster Identified by STRUCTURE ................................................ 4. Maps of Probability of Population Membership for Each of Six Clusters Identified by GENELAND ........................ 5. Maps of Probability of Population Membership for Each of Three Subdivisions of Cluster 3 ..................................... 6. Distribution of APERs From Randomization Tests of Two Linear and One Quadratic Discriminant Analyses ............. 7. Scatter Plots of Individuals Against the Two Linear Discriminants with the Greatest Discrimination for Two Linear Discriminant Analyses............................................. 8. Three-dimensional Scatter Plot of Individuals Against All Three Linear Discriminants for LDA2 ............................................... 9. Photographs of Habitat Typically Characterizing Each of Four Clusters ....................................................................... Page 5 19

20

21

23

25

28

29

30

ABBREVIATIONS

CAVE ................................................ Carlsbad Caverns National Park GUMO ....................................... Guadalupe Mountains National Park GRDL................ Lincoln National Forest Guadalupe Ranger District HWE ..................................................... Hardy-Weinberg Equilibrium MCMC ..................................................... Monte Carlo Markov Chain K ................................................. Number of Distinct Genetic Clusters DA .....................................................................Discriminant Analysis LDA ...................................................... Linear Discriminant Analysis QDA ................................................. Quadratic Discriminant Analysis LD ........................................................................ Linear Discriminant

xi

INTRODUCTION Individuals vary in their response to the environment they inhabit: individual trees within a species respond differently to fluctuations in light, moisture and nutrients thereby lessening competition and perhaps contributing to high species biodiversity (Clark 2010); experimental manipulation of density in three-spine sticklebacks (Gasterosteus aculeatus) resulted in individuals diversifying their diets to reduce intraspecific competition (Svanbck and Bolnick 2007); and sea otters (Enhydra lutris) differ in their ability to process foods of different size and type, leading to variable foraging strategies and diet specialization that most likely optimizes energetic return (Estes et al. 2003, Tinker et al. 2007). Understanding how diverse individuals contribute to the range of variation characterizing a populations response to a common environment, and to what degree such variation is genetically inherited or culturally transmitted, promises to link individual heterogeneity to population response and community dynamics for a greater understanding of the mechanisms driving ecological patterns (Bolnick et al. 2007). Heritable differences among individuals in foraging or settlement strategies may influence how genes are spatially distributed across the landscape within a species. Tundra/taiga wolves (Canis lupus) are specialist predators on migratory barren-ground caribou (Rangifer tarandus groenlandicus). These wolves are behaviorally, morphologically and genetically distinct from conspecific populations of wolves that inhabit boreal forest regions to the south (Musiani et al. 2007). The
1

boreal coniferous forest wolves are territorial, have a much lower incidence of a white coat color morph and differ from tundra wolves at three genetic markers, so much so, that the two ecotypes cluster into genetically diagnosable units. Coyotes (Canis latrans) also exhibit phylogeographic structure that can potentially be explained by individual heterogeneity in dispersal preference for particular habitats (Sacks et al. 2004). The underlying premise is that animals born into a specific habitat type will preferentially search for and settle in a similar habitat when dispersing. Such tendencies would result in a landscape genetic structure that is explained by habitat-specific breaks. Coyote genetic structure determined using genetic clustering approaches was concordant with specific bioregions and supportive of habitat-specific affinities in dispersal patterns resulting in habitat-dependent selection (Sacks et al. 2004). Each of these studies involved a generalist, highly vagile carnivore whose genetic structure was assessed across an expansive landscape. If individuals differ in their potential to settle in habitats where they were born or have learned to forage on specific prey that results in dietary specialization that could lead to genetic distinctiveness then the process should be independent of scale; these processes should operate at fine scales as long as habitat heterogeneity occurs within the pertinent scale.

Ringtails (Bassariscus astutus) are small (~1kg), nocturnal carnivores in the Family Procyonidae. The small size of ringtails suggests they have relatively limited vagility, making them an ideal model carnivore to assess more fine-scale genetic structure and whether such structure may be explained by preferences for specific habitat types. Ranging from southern Mexico to southern Oregon, ringtails are widespread across much of the southwestern United States (Poglayen-Neuwall and Toweill 1988). Ringtails are typically associated with steep rocky terrain, canyons, or mountain slopes (Trapp 1972, Callas 1987, Ackerson and Harveson 2006), but they are capable of exploiting virtually all habitat types within their range (Lacy 1983, Poglayen-Neuwall and Toweill 1988). In the Edwards Plateau region of western Texas, nearly every type of habitat available to ringtails was occupied (Taylor 1954). Despite their ability to exploit different habitats, ringtails do not necessarily use available habitats proportionally (Lacy 1983, Yarchin 1990, Ackerson 2001), suggesting that some habitats may be preferred over others and that habitat structure may play an important role in their distribution. Ringtail denning and home range size differs both within and between the sexes. Mean denning range varied from 40 to 278 ha for males and 20 to 124 ha for females, with average distances traveled between consecutively used dens ranging from 344 to 1080 m and 284 to 628 m, respectively (Toweill and Teer 1981, Callas 1987). Home ranges reported ranged from 22.7 to 139 ha for males and 16.9 and 129 ha for females (Trapp 1978, Yarchin 1990).

Here, we use the habitat generalist ringtail as a model small carnivore with limited vagility to, (1) evaluate levels of hierarchical genetic population structuring and (2) test for patterns of habitat-dependent clustering between genetically differentiated subpopulations. To assess population structure and connectivity, we used two Bayesian clustering techniques, implemented in the programs STRUCTURE and GENELAND, which determine the most likely number of genetically distinct subpopulations based on genetic data (Pritchard et al. 2000, Falush et al. 2003, Guillot et al. 2005). We then employed a discriminant function analysis to test for differences in habitat among clusters identified a priori by GENELAND that support the hypothesis of habitat-specific clustering. Finally, a corollary objective was to evaluate the degree of connectivity between two protected areas, Guadalupe Mountains National Park and Carlsbad Caverns National Park, using the ringtail as a short-range dispersal generalist.

METHODS Study Area We live-trapped ringtails in the Guadalupe Mountains of southern New Mexico and west Texas and focused on areas both within and between Carlsbad Caverns (CAVE) and Guadalupe Mountains (GUMO) National Parks. The area between the two parks is the Lincoln National Forests Guadalupe Ranger District.
4

The Guadalupe Mountains extend from west TX northeast into southern NM to the eastern border of CAVE. The entire mountain range is approximately 110 km long and 25 km wide (Hill 1996; Figure 1). Part of an ancient fossilized reef formed during the Permian period, these mountains rise dramatically from the floor of the Delaware Basin resulting in complex topography and steep and abrupt cliff faces along its entire length. Elevations range from 1100 m in both CAVE and GUMO, to 1900 m in CAVE and 2667 m in GUMO at the summit of Guadalupe Peak.

Figure 1. Guadalupe Mountains National Park (GUMO), Carlsbad Caverns National Park (CAVE) and the Guadalupe Ranger District of the Lincoln National Forest (GRDL) are located in southeastern NM and western TX. The black circles represent locations where ringtails were successfully captured.
5

The Guadalupe Mountains offer a unique environment to look at landscape connectivity and fine scale population genetic structure because the regions convoluted topography, range of elevations and edaphic interfaces provide for an array of different habitat types juxtaposed within a small geographic area (Northington and Burgess 1979). The lower elevations of CAVE and GUMO are uniquely located where the Chihuahuan Desert transitions to plains grasslands incorporating elements of both into the region (Northington and Burgess 1979). Higher elevations support oak-juniper-pion woodlands and coniferous forests, all of which are incised by both permanent and ephemeral riparian zones; transitional slopes incorporate characteristics of many habitats (Powell 1998). Genetic Sampling Ringtails were trapped using standard procedures (e.g., Roemer et al. 2000). Trapping took place from May 2006 to April 2009, inclusive. Depending on the transect size, from 6 to 29 carnivore live-traps (30 x 11 x 12; Safeguard, New Holland, PA 17557) were used. Traps were set approximately 250 m apart along transects positioned adjacent to roads, trails and washes for up to 10 nights (range = 2-10, mean = 4.54, SD = 1.69). Traps were baited with dry cat food and a scent bait, either loganberry paste or sardines; the scent bait was also placed outside the trap within one meter of the entrance. Traps were checked daily at sunrise. Ringtails were anesthetized initially

using a solution of medetomidine hydrochloride (50 g/kg) and ketamine hydrochloride (5 mg/kg) injected intramuscularly (Orion Corporation, Espoo, Finland). If sedation was incomplete, additional doses of 0.05 ml of the above were used in sequence. After processing, an antagonist to the medetomidine, antisedan hydrochloride, was administered (~ 200 250 g/kg; Orion Corporation, Espoo, Finland). Processing included the collection of a snip of ear tissue for genetic analysis, up to 10 ml of blood for disease assay, hair and standard physical measures. Individuals were marked either with the subcutaneous insertion of a Passive Integrated Transponder (PIT) tag (Biomark, Inc., Boise, ID 83702) or with an ear tag (National Band and Tag Company, Newport, KY 41072) and allowed to recover from anesthesia in the safety of the trap before being released. All animals captured were handled and released without complication in accordance with procedures sanctioned by the NMSU Institutional Animal Care and Use Committee (Permit # 2006 006). Landscape and Habitat Sampling Landscape and habitat characteristics were recorded at each trap location. Landscape features included slope, aspect, elevation, landform (i.e., valley, canyon, ridge, etc.) and land cover. Slope, aspect and elevation were measured with a clinometer, compass and Global Positioning System, respectively. Land cover was determined from existing vegetation maps created by the NM SWReGAP and TX GAP projects using ArcGIS (ESRI, Redlands, CA 92373). Vegetation classifications for land cover differed, with the TX GAP vegetation layer containing 21 land cover
7

types and the NM SWReGAP layer containing 52 land cover types. The two layers were condensed into a single layer by matching land cover types based on their descriptions. The resulting layer included five major (grassland, shrubland, riparian, woodland, forest) and five minor (bare soil, sand flats, dunes with sparse vegetation, consolidated rock with sparse vegetation, cropland) cover types. This generalization of habitat types removed some of the uncertainty typically associated with remotely sensed data. Habitat characteristics were also measured using a spoke design centered on each trapping location. The three spokes (transects) were 50 m in length, with equal angles (120) between each transect. The first angle was selected randomly. At 5 m intervals along each transect the plant species or microhabitat feature (i.e., bare soil, rock outcrop) that intercepted the line was recorded. For each site, the vegetative form recorded was characterized (tree, shrub, subshrub, forb, or grass) providing additional information on land cover. Genetic Analysis Tissue and blood samples collected for genetic analysis were stored in a -80C freezer prior to DNA extraction. A total of 153 ringtails were genotyped for fifteen tetranucleotide microsatellite markers; details regarding sample extraction, amplification and scoring can be found in Schweizer et al. (2009).

Standard Genetic Measures We calculated observed and expected levels of heterozygosity across all loci with the program SPAGEDI 1.3 (Hardy and Vekemans 2002). We calculated FIS values (Weir and Cockerham 1984) and tested for departure from Hardy-Weinberg equilibrium (HWE) across all loci using a Monte Carlo Markov Chain (MCMC) method as implemented in GENEPOP 4.0.10 (Raymond and Rousset 1995). Dememorization, number of batches and iterations per batch were increased to 10000, 500 and 8000, respectively, to achieve standard errors of <0.01 for the estimated Pvalues (Raymond and Rousset 1995). We tested for linkage disequilibrium in GENEPOP with dememorization, number of batches and iterations per batch increased to 10000, 1000 and 10000, respectively. Bonferroni corrections for multiple comparisons were used for the tests of HWE and for the pairwise tests for linkage disequilibrium (Rice 1989). STRUCTURE Analysis The program STRUCTURE 2.3.1 (Pritchard et al. 2000) has been used to evaluate levels of population structure for a wide range of taxa (Pritchard et al. 2000, Sacks et al. 2008, Deyoung et al. 2009, Pritchard et al. 2009, Stefenon et al. 2009). STRUCTURE uses multilocus genotypes to estimate Pr(X|K), the probability of the data (X) given the number of genetically distinct clusters (K), and classifies individuals into their most likely cluster (Pritchard et al. 2000). STRUCTURE does

not assume or need any a priori information on population boundaries or genotype frequencies, and groups individuals into clusters that best reflect HWE and linkage equilibrium across loci (Pritchard et al. 2000, Evanno et al. 2005). We conducted the analysis in STRUCTURE using the admixture and correlated allele models. When using STRUCTURE it is critical to determine the length of runs needed for differentiation. Highly diagnostic genotypes will require shorter lengths of runs, as would populations that are not admixed. Initially, 5-6 runs were conducted for all values of K (n = 1 15) to determine the number of burn-in repetitions and iterations required for data collection. The alpha and likelihood values were plotted against the number of iterations for each run of K to determine when they became stationary or an asymptote was reached; stationarity occurred in approximately 60000 burn-in repetitions. To infer the most likely number of clusters we performed 20 independent runs for each K with 100000 burn-in repetitions and 250000 iterations. We calculated L(K), the mean of the log likelihood for each K, and selected the K corresponding to the maximal value of L(K) as the most probable number of clusters (Pritchard et al. 2000). L(K) does not always provide an accurate estimate of K, but K, a statistic based on a second order rate of change of L(K) between successive K values, has been found to be more reliable (Evanno et al. 2005). We inferred the most likely K from the L(K) and K calculations and conducted a single run of STRUCTURE for the inferred K with 1000000 iterations to determine the proportion of ancestry (q) of individuals attributable to each K cluster.

10

GENELAND Analysis GENELAND 3.1.5 was implemented in the R statistical programming language (R Development Core Team 2005) and differs from STRUCTURE in that it assumes that individuals located closer to one another are more likely to be related than those found further apart, to this end, it incorporates geographic location into the model as priors (Guillot et al. 2005). The inclusion of geographic location into the admixture model results in an alteration of the assignment of individuals. Rather than an individual with equal ancestry in two populations being equally likely to be assigned to either population as with STRUCTURE, the same individual would be assigned with higher probability to a cluster of individuals it was more proximally located to. The model still allows for highly intricate spatial domains, however, and can cluster individuals together that are not in close proximity. We ran 10 independent runs of GENELAND, each time inferring K from the same range of clusters (K = 1 15). The number of iterations was set to 1000000 and the thinning was set to 100, retaining 10000 of the iterations. The burn-in was set to 1000, consequently discarding the first 1000 of the 10000 retained iterations to minimize the effects of the random starting configuration. GENELAND was implemented using the correlated allele and spatial models (Guillot et al. 2005, Guillot 2008). We attached an uncertainty of 5 m to the spatial coordinates of each individual which corresponded to the average error of the GPS unit, allowing individuals captured at the same location to be assigned to different clusters. All other
11

default values were retained as recommended. The run with the highest average posterior probability was selected for subsequent analyses. Maps of posterior probability of population membership were generated with a burn-in of 200; pixels in the spatial domain for the x and y axes were set to 101 and 129, respectively, to have the same resolution on both axes relative to the study area. We assessed levels of genetic diversity for each of the clusters identified by GENELAND, calculated FIS values (Weir and Cockerham 1984) and tested for departures from HWE with Bonferroni corrections as before. Assessment of Habitat-Dependent Genetic Structure Using the genetically determined clusters delineated with GENELAND as a priori groups, we examined if we could confidently discriminate these clusters based on some combination of habitat and landscape characteristics. Our central premise was that if individuals were dispersing and settling based on heritable differences in habitat preference, that such groups should be able to be delineated using both their genetic distinctiveness as well as their preference for particular habitats. While there are numerous statistical models available for classification, discriminant analysis (DA) has long been used to identify variables that can be used to differentiate groups and to develop predictive models. Making it even more appealing, DA has been studied in detail, providing a knowledge base to understand the effects of unequal group sample size, outliers and departures from model assumptions, such as

12

multivariate normality and equality of variance-covariance matrices (Riani and Atkinson 2001, Pohar et al. 2004, Sever et al. 2005, Finch and Schneider 2005, 2007). We used both linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) to discriminate among the a priori groups. Although both the LDA and QDA models allow for unequal sample sizes among groups, the sample size of the smallest group has to exceed the number of predictor variables. Using GENELAND, we identified six genetically distinct clusters (see Results). Clusters 1 and 6 had sample sizes smaller than the number of predictor variables and were consequently excluded from further analyses. LDA and QDA assume nonmulticollinearity among predictor variables (Tabachnick and Fidell 1996). We calculated correlation coefficients for all 55 pairwise comparisons of predictor variables and found seven correlations with levels ranging from only |0.26| to |0.36|. Correlations below 0.5 do not have a significant impact on model performance (Pohar et al. 2004, Finch and Schneider 2007). LDA and QDA are robust to departures in multivariate normality so long as the departure is related to skewness and not to outliers (Tabachnick and Fidell 1996, Sever et al. 2005, Finch and Schneider 2007); outliers were identified using Robust Mahalanobis Distance methods and also removed (Nordhausen et al. 2008). The LDA assumes multivariate normality of the predictor variables and homogeneity of covariance matrices among the different groups. We tested the assumption of homogeneity among covariance matrices using the Box 2 test (Box 1949) with a Bonferroni correction for multiple comparisons

13

(Rice 1989). The Box 2 test indicated that the covariance matrix of cluster 2 was not equal to the matrices of the other three clusters. When the assumption of homogeneity of covariance matrices is not satisfied, QDA performs better than LDA (Riani and Atkinson 2001, Finch and Schneider 2007), however, it has been suggested that this test is too strict and thus acceptance of the null hypothesis of equality between covariance matrices is difficult to meet (Tabachnick and Fidell 1996). In keeping with this view, we performed three different discriminant analyses. The first, LDA1 was conducted using all four clusters with observed sample size, the second was a QDA with these same data and the final was a modified LDA (LDA2) where sample sizes were drawn at random so they were more equitable. The covariance matrix of cluster 2 was determined to be significantly different (P = 0.001) from the other three clusters and because cluster 2 (n2 = 58) was disproportionally larger than the other three clusters (n3 = 19, n4 = 28, n5 = 13), we suspected that this might be the cause of the disparity in covariance matrices (Finch and Schneider 2007). To reduce this difference, we randomly selected a sample of individuals from cluster 2 equal in size to the mean of the other three clusters (n2s = 20). We retested the assumption of homogeneity among covariance matrices using the Box 2 test with a Bonferroni correction and found the assumption of homogeneity in covariance matrices was met; subsequent analyses on this modified dataset constituted LDA2. Each LDA was performed using 11 habitat and landscape characteristics defining the remaining four clusters (#s 2, 3, 4, 5). QDAs perform less favorably
14

with categorical variables (Finch and Schneider 2007), so we removed Habitat Type and ran the QDA with the remaining 10 predictor variables. To compare the performance of each DA, we calculated the apparent classification error rate (APER), or misclassification rate, using the leave-one-out method (Sever et al. 2005). This method generates the predictive equations based on n 1 individuals and uses the resultant equations to classify the omitted individual. The procedure is repeated n times, each time excluding a different individual. We then compared the results of the three different DAs through randomization. We conducted 100000 randomizations in which each individual was assigned into a random group whose sample size was in proportion to the sample size of the original genetic clusters, a DA was performed and the APER calculated. The APER of our original observed data was then compared to the distribution of APERs generated via randomization. We also compared the results of each LDA by graphically depicting the distribution of points on the first two discriminant functions, which helps in visualization and interpretation of the results (Pardoe et al. 2007). Calculation of APER using the leave-one-out method may provide optimistic results and cross-validation has been proposed to confirm findings (Finch and Schneider 2007). We randomly divided the data into two datasets for training and validation comprised of approximately 80% and 20% of the original dataset, respectively. Using the same LDA1 and QDA models as before, we used the training data to generate the predictive equations, and the validation data to estimate the

15

APER. All analyses and randomization tests were performed in the R programming language and software environment (R Development Core Team 2005; Appendix A).

RESULTS Trapping and Habitat Sampling A total of 20 trapping sessions were conducted across the study area incorporating 314 trap locations established on 19 separate transects. A total of 1425 trap nights occurred between May 25, 2006 and April 6, 2009, resulting in the capture of 157 individual ringtails and 30 recaptures. Genetic samples were obtained from 153 individuals. Other mesocarnivores captured included 13 gray foxes (Urocyon cinereoargenteus), 11 striped skunks (Mephitis mephitis), 7 spotted skunks (Spilogale gracilis), 3 raccoons (Procyon lotor), 1 hog-nosed skunk (Conepatus mesoleucus) and 1 opossum (Didelphis virginiana). Landscape and habitat characteristics were recorded at each trapping location, providing information on land cover (Table 1). Genetic Sampling and Standard Genetic Measures The number of alleles ranged from 7 to 34 (mean = 16.67, SD = 7.58) across all 15 loci (Table 2). Multilocus genotypes were obtained for all individuals except for five individuals that were missing data at a single locus. Both the number of alleles and the observed heterozygosity indicated high levels of genetic diversity.
16

Global tests for heterozygote deficiency indicated departure from HWE (P < 0.0001) for three loci, GATA5, GATA105 and AAAG30 (Table 2). These loci were retained in subsequent analyses, as GENELAND is robust to potential causes of heterozygote deficiency (Coulon et al. 2006, Pilot et al. 2006). There were no significant departures from linkage equilibrium following corrections for multiple comparisons.

Table 1. Minimum, maximum, median, mean and standard deviation (SD) of 10 variables recorded at each trap location where a ringtail was captured. Variable No. Species Simpsons_SDI %Trees %Shrubs %Subshrubs %Grass %Forbs %Rock Elevation Slope (Deg.) Minimum 5 0.55 0 0 0 0 0 2 1135 0 Maximum 21.00 0.98 60.00 73.33 55.00 63.33 30.00 60.00 2631.00 56.00 Median 13.00 0.93 6.67 40.00 3.33 16.66 0.00 20.00 1654.00 9.00 Mean 12.55 0.98 11.30 38.56 7.09 19.89 3.49 24.26 1679.75 12.75 SD 2.78 0.05 14.02 17.70 8.49 13.42 5.28 15.08 372.20 10.48

Additionally, habitat type (a categorical variable) and aspect were also recorded.

17

Table 2. Number of alleles (Na), observed heterozygosity (HO), expected heterozygosity (HE), fixation index (FIS) and P-value for test of Hardy-Weinberg proportions for 15 microsatellite loci across the total population. Locus GATA5 GATA47 GATA105 AAAG2 AAAG3 AAAG20 AAAG22 AAAG28-2 AAAG30 AAAG36 AAAG45 GATA73 AAAG81 AAAG84 AAAG89 Na 9 10 7 34 21 17 10 15 26 11 16 14 26 12 21 HO 0.3355 0.7974 0.5098 0.8627 0.8562 0.8889 0.8105 0.5752 0.7778 0.8487 0.7867 0.8431 0.9150 0.7190 0.9020 HE 0.6236 0.8017 0.6318 0.9495 0.9180 0.8623 0.8383 0.5839 0.9164 0.8535 0.8587 0.8732 0.9097 0.7940 0.9142 FIS 0.4627 0.0053 0.1936 0.0916 0.0675 -0.0310 0.0333 0.0150 0.1517 0.0057 0.0841 0.0345 -0.0059 0.0947 0.0135 P-value <0.0001 0.5344 <0.0001 0.0061 0.1724 0.2309 0.1452 0.4736 0.0016 0.4095 0.2045 0.3972 0.8444 0.0447 0.5857 SE <0.0001* 0.0041 <0.0001* 0.0015 0.0060 0.0069 0.0024 0.0134 0.0007* 0.0040 0.0058 0.0052 0.0077 0.0020 0.0084

* Not in Hardy-Weinberg proportions following Bonferroni corrections

18

Bayesian Clustering Analyses The STRUCTURE analysis indicated maximal values of L(K) and K at K = 2, suggesting that there were two unique population clusters (Figure 2). At K = 2 a large proportion of individuals (n = 144) showed high levels of ancestry (q 0.80) to only one of the clusters (Figure 3). Seven of the nine remaining individuals were grouped into a second locale at the southwestern edge of GUMO, while the other two were found between the parks in GRDL.

Figure 2. The number of most probable genetically distinct clusters of ringtails (K = 2) in the Guadalupe Mountains of west Texas and southern New Mexico using the mean maximal likelihood value [L(K)] and the second order rate of change (K). Summarized from 20 runs of STRUCTURE at each of K = 1 15 clusters.
19

Figure 3. The proportion of each individuals ancestry (q) attributable to each of K = 2 clusters resulting from the program STRUCTURE. Each vertical bar represents 1 of the 153 individuals. The horizontal line represents the separation of the 2 clusters. Cluster 1 contains 144 individuals, cluster 2 contains 9 individuals.

The 10 independent runs of GENELAND consistently displayed a clear mode at K = 6, revealing more fine scale genetic structure than was evident using STRUCTURE (Figure 4). We observed similar levels of genetic diversity in each of the six clusters identified by GENELAND (Table 3). Only cluster 2 and cluster 3 had a significant deficiency in heterozygosity potentially indicating cryptic subdivision (Table 3). Pairwise FST values ranged from 0.01 to 0.07 indicating weak to moderate genetic differentiation among clusters (Table 4). To investigate cryptic subdivisions in clusters 2 and 3, we tested for additional subdivisions allowing K to vary between 1 and 6. We did not discover any cryptic substructuring within cluster 2, as GENELAND converged at K = 1 in nine of 10 runs. Cryptic substructure was discovered in cluster 3, however, with all 10 runs indicating K = 3 subdivisions (Figure 5). Pairwise FST values ranged from 0.04 to
20

Figure 4. Maps of K = 6 clusters indicating high (light color) to low (dark color) probability of population membership for individuals (black circles). Note cluster numbers in the lower right hand corner of each panel.

21

Table 3. Average number of alleles per locus (A), observed heterozygosity (HO), expected heterozygosity (HE), fixation index (FIS) and the P-value and standard error (SE) for tests of heterozygote deficiency for each of six clusters identified by GENELAND. Cluster 1 2 3 4 5 6 A 4.87 14.47 10.87 12.00 8.67 4.53 HO 0.7191 0.7396 0.7846 0.7848 0.7846 0.7500 HE 0.7211 0.8082 0.8324 0.8177 0.7961 0.8048 FIS 0.0015 0.0877 0.0469 0.0386 0.0196 0.0897 P-value 0.3363 <0.0001 0.0033 0.0277 0.6265 0.2384 SE 0.0056 <0.0001* 0.0007* 0.0033 0.0102 0.0034

* Significant deficit in heterozygotes following Bonferroni corrections

Table 4. Pairwise FST values (below the diagonal) and the P-value of Fisher's exact probability test implemented in GENEPOP (above the diagonal). Cluster 1 2 3 4 5 6 0.0326 0.0394 0.0408 0.0495 0.0702 0.0172 0.0083 0.0135 0.0345 0.0175 0.0244 0.0259 0.0227 0.0402 0.0546 1 2 0.0041 3 0.0009 <0.0001 4 0.0007 <0.0001 <0.0001 5 <0.0001 <0.0001 <0.0001 <0.0001 6 0.0001 0.0005 0.0120 0.0007 0.0042

22

0.09 indicating weak to moderate, but significant, genetic differentiation. Cluster 3.1 corresponded strongly to the second group of nine individuals identified by STRUCTURE, in that it contained six of the seven individuals that had been grouped at the second locale in southwestern GUMO.

Figure 5. Maps of 3 cryptic subdivisions in Cluster 3 indicating high (light color) to low (dark color) probability of subdivision membership for individuals (black circles).

23

Discriminant Analysis of Habitat-Dependent Genetic Structure Both the LDA1 and QDA models reached similar conclusions, classifying greater than 80% of the individuals correctly into their genetically identified cluster based on habitat and landscape variables. The LDA1 model resulted in an APER of only 18.64% (Wilks = 0.1865, P < 0.001; Table 5), whereas the APER of the QDA model was 19.49%. The APER values of the observed data were significantly different compared to the distribution of APERs from the randomization exercises, which ranged from 32.20% to 57.63% (mean = 46.79%) for LDA1 and from 43.22% to 83.05% (mean = 66.27%) for the QDA (Figure 6). Furthermore, Wilks from the LDA randomizations ranged from 0.4765 to 0.9170 (mean = 0.7417). Of the three linear discriminants identified by LDA1, over 75% of the variation was explained by the first discriminant function (LD1); the second (LD2) and third linear discriminants (LD3) explained an additional 17% and 8% of the variation, respectively (Table 5). Table 5. Eigenvalues and proportion of variance explained (Proportion) for each discriminant function in two LDAs (LDA1 and LDA2) and test statistics (Wilks ), significance (P-value) and apparent classification error rate (APER) for overall model. Model Eigenvalue Proportion Wilks (P-value) APER LD1 8.7757 0.7505 LDA1 LD2 4.1876 0.1709 LDA2 LD2 4.1718 0.1669

LD3 2.8400 0.0786

LD1 8.9732 0.7721

LD3 2.5238 0.0611

0.1865 (<0.0001) 18.64%

0.1134 (<0.0001) 12.50%

24

Figure 6. Distribution of APERs from randomization tests for (a) LDA1, (b) QDA and (c) LDA2. The vertical line represents the APER of the empirically derived clusters that were grouped based on genetic similarity using GENELAND.

25

Cross-validation of the APERs by reconstructing and testing the models with training and validation datasets further strengthens these findings. The APER of the LDA with cross-validation increased only slightly to 20.83%, whereas the APER of the QDA decreased with cross-validation to 17.39%. LDA2 decreased the APER to 12.5%, a value significantly less than the APER of comparative randomization tests (range = 30% 70%, mean = 50.41%; Figure 6c). The Wilks reduced to 0.1134 (P < 0.001) compared to the range of 0.3257 to 0.8920 (mean = 0.6338) for the randomization tests. The influence of the linear discriminants in the LDA2 were similar to that of the LDA1, with LD1, LD2 and LD3 explaining 77%, 17% and 6% of the variance, respectively (Table 5). LDA1 and LDA2 yielded concordant results suggesting that while the covariance matrix of cluster 2 in LDA1 was significantly different, the observed difference may have been related to the disproportionate sample sizes. The standardized linear discriminant scaling coefficients of the predictor variables can be used to compare the discriminating power of each predictor variable. The most influential variables in LD1 were elevation and the proportion of tree and shrub cover. The proportion of cover attributable to trees and shrubs, along with subshrubs, had the greatest influence on LD2 (Table 6). LD1 discriminated individuals from cluster 2 from those attributed to clusters 3 and 5. LD2 separates clusters 3 and 5 from one another (Figure 7). Cluster 4, while appearing to settle between the other three clusters, is differentiated in a third dimension by LD3 (Figure 8).
26

Table 6. Standardized linear discriminant scaling coefficients for LDA1 and LDA2. Model Elevation %Trees %Shrubs %Subshrubs %Grass %Forbs Simpsons_SDI %Rock Aspect Habitat Slope LD1 1.3679 LDA1 LD2 0.1996 LDA2 LD2 LD3 0.0290 -0.0826

LD3 0.0702

LD1 1.7972

0.7520 -1.2603 -0.1941 0.5850 -1.0889 -0.0963 -0.7185 0.5060 0.6302

0.6837 -1.2691 -0.3895 0.6862 -1.1385 -0.5645 -0.8360 0.2339 0.5851

-0.1897 -0.6121 -0.3555 -0.0702 -0.4090 0.2926 0.0748

-0.1349 -0.4870 -0.4451 0.0992 -0.4731 0.0740 0.0285

0.2198 -0.4364

0.2409 -0.1372

0.1646 -0.6695 -0.3009 0.3714 -0.4012 0.5524 0.3630

0.1645 -0.6398 -0.3090 0.5816 -0.2883 0.4478 0.5882

0.3770 -0.0215

0.0456 -0.0623

0.2652 -0.5919 -0.0780

0.0587 -0.5187 -0.1457

Both models were significant (P < 0.0001), with LDA1 explaining 81.35% of the variation in the data and LDA2 explaining 88.66% of the variation. Thus, all three models can be used to confidently discriminate clusters based on habitat characteristics, which correspond to the a priori genetic clusters identified using GENELAND. Figure 9 shows habitat typical of each of the clusters. Cluster 2 (top left), characterized by lower elevation and low levels of trees and shrubs, is dominated by subshrubs. Dominant species in this cluster included succulents such as
27

Figure 7. Traditional scatter plots of individuals against LD1 and LD2 for (a) LDA1 and (b) LDA2. The numbers correspond to the clusters individuals were assigned to based on the results of GENELAND.
28

Figure 8. Three-dimensional scatter plot of individuals against all three linear discriminants for LDA2. Cluster numbers correspond to clusters identified a priori with GENELAND.

lechuguilla (Agave lechuguilla), sotol (Dasylirion leiophyllum), prickly pear (Opuntia spp.) and yucca (Yucca spp.). Cluster 3 (top right) was distinguished by higher elevations and moderate to low levels of trees and shrubs, with dominant species including grama (Bouteloua spp.), oak scrubs (Quercus spp.) and mountain mahogany (Cercocarpus montanus). Cluster 4 (bottom left) was found at moderate

29

elevations along the escarpment of the mountain range and had fairly even distribution of trees and shrubs with broad diversity of species. Cluster 5 (bottom right) was characterized by the highest elevations and a high proportion of trees and shrubs. This cluster was dominated by oaks, maples (Acer grandidentatum) and pinon (Pinus edulis) and ponderosa (Pinus ponderosa) pines. Junipers (Juniperus spp.) were common in all the clusters.

Figure 9. Photographs of habitat typically characterizing each of the four clusters: cluster 2 (top left), cluster 3 (top right), cluster 4 (bottom left), cluster 5 (bottom right).

30

DISCUSSION A limited number of studies have assessed natal habitat-biased dispersal or habitat-specific subdivisions, but there has been growing support that habitat and environmental variables influence population genetic structure and may effectively influence speciation (Doebeli and Dieckmann 2003). Studies of the behavior and ecology of a variety of species have provided evidence that individuals do tend to disperse to habitats similar to their natal habitats (Haughland and Larsen 2004, Pilot et al. 2006, Mabry and Stamps 2008, Sacks et al. 2004, 2005, 2008). Further, studies that have found genetic structure within populations in the absence of permanent physical barriers to dispersal have attributed their observations, at least in part, to natal habitat-biased dispersal (Carmichael et al. 2001, Rueness et al. 2003, Ernest et al. 2003). The tendency of individuals to disperse to habitats similar to their natal habitat may be a fairly common driver of genetic structure within populations (Sacks et al. 2004). Our results demonstrate that ringtails were assigned with a high level of confidence to their a priori genetic clusters based exclusively on the habitat and landscape characteristics collected at their capture locations. All the DAs proved to be robust, assigning a significantly high proportion (>80%) of individuals to their observed genetic cluster, and clearly indicating significant discriminant power that cannot be explained by random chance alone. Although ringtails have the ability to exploit nearly all habitats, they appear to preferentially select for familiar habitats
31

during dispersal and settlement. Apparently, the tendency of individuals to select for familiar habitat is strong enough to produce fine scale genetic structure in numerous species in the absence of physical barriers to dispersal. Coyotes have been studied at multiple scales in California. Rather than displaying panmixia, or a pattern of isolation by distance, coyote populations exhibited a structure that was associated with major breaks in habitat types in an otherwise unobstructed landscape; the impact of individuals tending to disperse to similar habitats was sufficient to generate habitat-specific fine scale genetic structuring (Sacks et al. 2004, 2005, 2008) Similarly, gray wolves in a variety of locales reveal genetic structuring related to habitat without the presence of sufficient barriers to dispersal. In northwestern Canada, gray wolves had higher levels of gene flow on either side of a river than across the river, even though the river was frozen for up to 8 months out of the year and should not have constituted a barrier to gene flow (Carmichael et al. 2001). Apparently the specialization of wolves on different prey on either side of the river led to their genetic differences. Wolves spend a considerable amount of time with their natal pack before dispersing and this prolonged exposure to specific habitats and diet specialization may influence settlement decisions. Wolves in Europe, Canada and across all of North America also exhibit population genetic structure presumably related to climate, habitat and specialization for particular prey (Geffen et al. 2004, Pilot et al. 2006, Musiani et al. 2007). Evidence of more profound ecological
32

divergence has also been found in a wide-ranging marine predator, the killer whale (Orcinus orca). Genetic differentiation in both the North Pacific and North Atlantic killer whales have been driven by prey specialization and has resulted in multiple sympatric groups within each region that differ not only in foraging strategies, but also in behavioral and morphological traits (Herman et al. 2005, Foote et al. 2009). Whether terrestrial or marine, all of these aforementioned species are large and highly mobile, and the patterns of genetic structure or ecological divergence attributable to habitat- specific dispersal or specialization on a particular prey were assessed at large spatial scales. But the proclivity of an individual to select for a particular habitat or a specific prey species that they are familiar with should increase their fitness regardless of their size or the scale at which these processes occur. Thus, we might expect such patterns to be evident at very fine spatial scales as long as the scale reflects the ecology of the target species. Habitat and environmental factors have been shown to influence fine-scale population structure and dispersal in smaller bodied, less vagile species as well. Brush mice (Peromyscus boylii) located in either woodland or chaparral habitats of close proximity are more likely to settle within their natal habitat following dispersal, even when the dispersal path transected more than one habitat type (Mabry and Stamps 2008). Likewise, dispersing red squirrels (Tamiasciurus hudsonicus) exposed to heterogeneous habitat stages of coniferous forest settled more often in habitat similar to the natal habitat, even when that habitat was of relatively poor quality (Haughland
33

and Larsen 2004). Experience with a particular habitat may offset the poorer quality by providing individuals with a degree of familiarity that results in a competitive edge. Ringtails as a Model for Assessing Fine Scale Genetic Structure Ringtails exploit diverse habitats and the genetic structure we observed could be related to individuals specializing on particular denning sites, foraging habitat or prey. Ringtails den in nearly any small opening or cavity, including rock crevices, hollow trees, snags, woodrat nests, brush piles and buildings (Taylor 1954, Toweill 1976, Trapp 1978, Callas 1987). They use a wide variety of habitats but have shown preference for some. Ringtails captured in a riparian forest habitat in CA showed an affinity for riparian and cottonwood areas (Lacy 1983), whereas ringtails captured along a riparian zone in AZ preferentially selected for riparian and juniper habitats over other available habitat types (Yarchin 1990). In TX, ringtails showed a preference and disproportionate use of specific catclaw dominated habitats (Ackerson 2001). They also have a catholic diet, foraging on plants, insects, small mammals, lizards, amphibians, birds and carrion (Taylor 1954, Toweill and Teer 1977, Alexander et al. 1994). Although the mechanism driving the habitat-specific genetic structure we observed is uncertain, examining individual variation in ringtail preferences for dens, foraging habitat and/or prey may be the next step.

34

The Guadalupe Mountains also provided a unique environment to conduct this assessment for several reasons. The remoteness, ruggedness and protection of lands (i.e., NPS and USFS lands) in the study area have limited development and consequently human disturbance. There are no major roads or waterways, which could serve as physical barriers to dispersal for such a small carnivore and the extreme topography, diversity in elevation and complexity in soil types results in high habitat diversity within a small geographic area. The extreme topographic features influence habitat, but it seems unlikely that these features directly influenced clustering by acting as barriers to dispersal, as ringtails have a well developed ability to climb and exploit these features (Trapp 1972). Indeed, after release, we observed ringtails climb vertical walls and trees with ease. The pattern of relatedness among genetic clusters also corresponded with elevation and habitat. We were able to identify several hierarchical levels of genetic structure within a small geographic area, which suggests that processes affecting structure are complex. The highest level of genetic differentiation identified by STRUCTURE revealed two subpopulations, one of which contained over 94% of the individuals. In a continuously distributed population, some level of admixture between adjacent subpopulations is expected. While the level of gene flow between adjacent subpopulations is sufficient to indicate high levels of connectivity, the level of admixture does not appear to be sufficient to oppose local genetic processes that contribute to more fine scale genetic structuring (Sacks et al. 2008).

35

When both STRUCTURE and GENELAND are used, GENELAND may detect more subtle population structure (Coulon et al. 2006, Sacks et al. 2008). Through the incorporation of geographic locations as a priori information, GENELAND revealed additional population structure in our study as well. The six genetically identified clusters showed weak to moderate levels of genetic differentiation. Maps of the six clusters indicated highly intricate spatial domains, with individuals that were in close proximity being grouped to different clusters and other more distantly located individuals grouped together. This pattern appeared to be influenced to a large degree by elevation and associated habitat complexity. The elevation range is greatest at the western end of the Guadalupe Mountains, contributing to a broad diversity of habitats. The eastern end of the mountain range is flatter and consequently more homogeneous; the western portion of the range exhibited a greater number of genetic clusters, whereas the entire eastern portion of the range supported only one large cluster (Figure 4). A very similar pattern was identified in coyotes, in which population boundaries corresponded very closely to habitat breaks (Sacks et al. 2004). Further analysis of this pattern revealed that in two adjacent regions with contrasting levels of habitat heterogeneity, the more heterogeneous landscape supported cryptic genetic structure, whereas the more homogeneous landscape supported a large single population (Sacks et al. 2008). Standard genetic measures of the six clusters revealed significant deficiency of heterozygotes in clusters 2 and 3 that could potentially indicate the presence of null

36

alleles or additional cryptic subdivisions as a result of a Wahlund effect (Hedrick 2005). Further analysis did not detect cryptic substructure in cluster 2, but did uncover three hierarchical subdivisions in cluster 3. Presence of scat across the study area indicated that ringtails are widespread and this coupled with the cryptic subdivisions of cluster 3 suggest that even greater structuring may be revealed with a more intensive sampling effort. Discriminant Analysis of Habitat-Specific Clustering The dispersal capabilities of ringtails have not yet been assessed, but their small size and previous studies on home range suggest that they have relatively low vagility (Trapp 1978, Toweill and Teer 1981, Callas 1987, Yarchin 1990), an important issue when selecting the appropriate scale of study to detect patterns of habitat-specific clustering. We took a new approach to assess habitat-specific clustering in this mesocarnivore. By coupling emerging population genetic techniques with established statistical techniques we were able to demonstrate that ringtails do, in the absence of physical barriers to dispersal, exhibit habitat-specific clustering that is strong enough to drive fine scale genetic structure. Using the clusters identified by GENELAND as a priori groupings, we were able to show that individuals could be grouped with a high level of confidence to their observed clusters based exclusively on habitat and landscape variables via DA.

37

In both the LDA1 and LDA2 elevation was the variable with the greatest discriminating power, followed by the proportion of cover attributable to trees and shrubs. The impact of elevation may be in part related to its influence on vegetative communities, but we propose that the influence of elevation may also be related to the evolutionary divergence of physiological traits. Minimum resting metabolic rate (MRM) was investigated for ringtails between two differing habitats, low lying desert and higher elevation montane habitat (Chevalier 1991). The larger bodied desert ringtails were expected to have a higher MRM than their montane counterparts, but field energetics revealed that these larger bodied ringtails had reduced their MRM to a level lower than the smaller montane ringtails (Chevalier 1991, Garland and Adolph 1994). The rate of divergence of the MRM for these two habitat specific ringtail populations occurred at a much faster rate than expected (Garland and Adolph 1994). It is possible that habitat-specific dispersal tendencies could be contributing to divergence in physiological traits as the microenvironments in high elevation and low elevation sites would be expected to differ, resulting in differential selective pressures that could influence energy and heat transfer (McNab 2002). The proportion of cover attributable to trees and shrubs were also important variables in differentiating particular clusters. Trees and shrubs are important sources of cover and protection from predators; they may influence the type and abundance of dens and provide different prey items which could influence foraging strategies. For example, ringtails rely heavily on plant material in some areas, and the availability of

38

such food items could influence settling decisions. Additionally, a three-dimensional model of home ranges may be more appropriate than a two-dimensional model, for this species (Lacy 1983, Yarchin 1990). The presence of tree and shrub cover relates directly to this premise, as animals may be more likely to disperse to vertically complex habitats similar to their natal habitat. In conclusion, these results demonstrate that ringtails, though having the ability to exploit nearly all habitats, appear to preferentially select for habitats similar to their natal habitat in heterogeneous landscapes and that this tendency is strong enough to produce fine scale genetic structure. The mechanism(s) driving this selection may be related to foraging behavior and prey specialization, habitat structure or physiological adaptation, and may be influenced by all of these. The maintenance of fine scale genetic structure and ecological heterogeneity is a key to the conservation of biodiversity. Understanding the role environmental factors play in directing habitat-specific clustering can help direct conservation efforts of appropriate corridors, provide insight into animal movements and dispersal patterns and help determine the appropriate scale of study and management for species (Manel et al. 2003).

39

REFERENCES Ackerson BK (2001) Characteristics of a ringtail population in Elephant Mountain Wildlife Management Area, Texas. M.S. Thesis, Sul Ross State University. Ackerson BK, Harveson LA (2006) Characteristics of a ringtail (Bassariscus astutus) population in Trans Pecos, Texas. Texas Journal of Science, 58, 169-184. Alexander LF, Verts BJ, Farrell TP (1994) Diets of ringtails (Bassariscus astutus) in Oregon. Northwestern Naturalist, 75, 97-101. Bolnick DI, Svanbck R, Araujo MS, Persson L (2007) Comparative support for the niche variation hypothesis that more generalized populations also are more heterogeneous. Proceedings of the National Academy of Sciences, 104 (24), 100075-10079. Box GEP (1949). A general distribution theory for a class of likelihood criteria, Biometrika, 36, 317-346. Callas R (1987) Ringtail (Bassariscus astutus) den and habitat use in Northwestern California. M.S. Thesis, Humboldt State University. Carmichael LE, Nagy JA, Larter NC, Strobeck C (2001) Prey specialization may influence patterns of gene flow in wolves of the Canadian Northwest. Molecular Ecology, 10, 2787-2798. Chevalier CD (1991) Aspects of thermoregulation and energetics in the procyonidae (Mammalia: Carnivora). Ph.D. Dissertation, University of California, Irvine. Clark JS (2010) Individuals and the variation needed for high species diversity in forest trees. Science, 327, 1129-1132. Coulon A, Guillot G, Cosson JF et al. (2006) Genetic structure is influenced by landscape features: empirical evidence from a roe deer population. Molecular Ecology, 15, 1669-1679. Deyoung RW, Zamorano A, Mesenbrink BT et al. (2009) Landscape-genetic analysis of population structure in the Texas gray fox oral rabies vaccination zone. Journal of Wildlife Management, 73, 1292-1293. Doebeli M, Dieckmann U (2003) Speciation along environmental gradients. Nature, 421, 259-264.

40

Ernest HB, Boyce WM, Bleich VC et al. (2003) Genetic structure of mountain lion (Puma concolor) populations in California. Conservation Genetics, 4, 353366. Estes JA, Riedman ML, Staedler MM et al. (2003) Individual variation in prey selection by sea otters: patterns, causes and implications. Journal of Animal Ecology, 72, 144-155. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611-2620. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using mulitlocus genotype data: Linked loci and correlated allele frequencies. Genetics, 164, 1567-15878. Finch WH, Schneider MK (2005) Misclassification rates for four methods of group classification: Impact of predictor distribution, covariance inequality, effect size, sample size, and group size ratio. Educational and Psychological Measurement, 66, 240-257. Finch WH, Schneider MK (2007) Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology, 3 (2), 47-57. Foote AD, Newton J, Piertney SB, Willerslev E, Gilbert MTP (2009) Ecological, morphological and genetic divergence of sympatric North Atlantic killer whale populations. Molecular Ecology, 18, 5207-5217. Garland T, Adolph SC (1994) Why not to do two-species comparative studies: Limitations on inferring adaptation. Physiological Zoology, 67 (4), 797-828. Geffen E, Anderson MJ, Wayne RK (2004) Climate and habitat barriers to dispersal in the highly mobile grey wolf. Molecular Ecology, 13, 2481-2490. Guillot G (2008) Inference of structure in subdivided populations at low levels of genetic differentiation the correlated allele frequency model revisited. Bioinformatics, 24, 2222-2228. Guillot G, Mortier F, Estoup A (2005) GENELAND: a computer package for landscape genetics. Molecular Ecology Notes, 5, 712-715.

41

Hardy OJ, Vekemans X (2002) SPAGEDI: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Molecular Ecology Notes, 2, 618-620. Haughland DL, Larsen KW (2004) Exploration correlates with settlement: red squirrel dispersal in contrasting habitats. Journal of Animal Ecology, 73, 1024-1034. Hedrick PW (2005) Genetics of Populations, 3rd edn. Jones and Bartlett Publishers, Inc., Boston. Herman DP, Burrows DG, Wade PR et al. (2005) Feeding ecology of eastern North Pacific killer whales Orcinus orca from fatty acid, stable isotope and organochlorine analyses of blubber biopsies. Marine Ecology Progress Series, 302, 275-291. Hill C (1996) Geology of the Delaware Basin Guadalupe, Apache, and Glass Mountains New Mexico and West Texas. Permian Bason Section-SEPM Publication No. 96-39. Lacy MK (1983) Home range size, intraspecific spacing, and habitat preference of ringtails (Bassariscus astutus) in a riparian forest in California. M.A. Thesis, California State University. Mabry KE, Stamps JA (2008) Dispersing brush mice prefer habitat like home. Proceedings of the Royal Society B, 275, 543-548. Manel S, Schwartz MK, Luikart G, Taberlet P (2003) Landscape genetics: Combining landscape ecology and population genetics. Trends in Ecology and Evolution, 18, 189-197. McNab BK (2002) The physiological ecology of vertebrates. Comstock Publishing Associates, Cornell University Press, Ithaca, NY. Musiani M, Leonard JA, Cluff HD et al. (2007) Differentiation of tunda/tiaga and boreal coniferous forest wolves: genetics, coat colour and association with migratory caribou. Molecular Ecology, 16, 4149-4170. Nordhausen K, Oja H, Tyler DE (2008) Tools for exploring multivariate data: The package ICS. Journal of Statistical Software, 28 (6), 1-32. Northington DK, Burgess TL (1979) Summary of the vegetative zones of the Guadalupe Mountains National Park, Texas. pp. 51-57 in Genoways HH, Baker RJ, eds. Biological investigations in the Guadalupe Mountains National
42

Park: Proceedings of a symposium. Proceedings and Transactions Series No.4. US. Department of the Interior, National Park Service, Washington, DC. Pardoe I, Yin X, Cook RD (2007) Graphical tools for quadratic discriminant analysis. Technometrics, 49 (2), 172-183. Pilot M, Jedrzejewski W, Branicki W et al. (2006) Ecological factors influence population genetic structure of European Wolves. Molecular Ecology, 15, 4533-4553. Pohar M, Blas M, Turk S (2004) Comparison of logistic regression and linear discriminant analysis: A simulation study. Metodoloki zvezki, 1 (1), 143-161. Poglayen-Neuwall I, Toweill DE (1988) Bassariscus astutus. Mammalian Species, 327, 1-8. Powell AM (1998) Trees and Shrubs of the Trans-Pecos and Adjacent Areas. University of Texas Press, Austin. Pritchard JK, Stephens M, Donnelly P (2000) Inference of Population Structure Using Multilocus Genotype Data. Genetics, 155, 945-959. Pritchard VL, Metcalf JL, Jones K, Martin AP, and Cowley DE (2009) Population structure and genetic management of Rio Grande cutthroat trout (Oncorhynchus clarkii virginalis). Conservation Genetics, 10, 1209-1221. R Development Core Team (2005) R: A language and environment or statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-90005107-0, URL: http://www.R-project.org. Raymond M, Rousset F (1995) GENEPOP: population genetics software for exact tests and ecumenicism. Journal of Heredity, 86, 248-249. Riani M, Atkinson AC (2001) A unified approach to outliers, influence, and transformation in discriminant analysis. Journal of Computational and Graphical Statistics, 10 (3), 513-544. Rice W (1989) Analyzing tables of statistical tests. Evolution, 43, 223-225. Roemer GW, Coonan TJ, Garcelon DK, Starbird CH, McCall JW (2000) Spatial and temporal variation in the seroprevalence of canine heartworm antigen in the island fox. Journal of Wildlife Diseases, 36 (4), 723-728.

43

Rueness EK, Stenseth NC, ODonoghue M et al. (2003) Ecological and genetic spatial structuring in the Canadian lynx. Nature, 425, 69-72. Sacks BN, Bannasch DL, Chomel BB, Ernest HB (2008) Coyotes demonstrate how habitat specialization by individuals of a generalist species can diversify populations in a heterogeneous ecoregion. Molecular Biology and Evolution, 25, 1384-1394. Sacks BN, Brown SK, Ernest HB (2004) Population structure of California coyotes corresponds to habitat-specific breaks and illuminates species history. Molecular Ecology, 13, 1265-1275. Sacks BN, Mitchell BR, Williams CL, Ernest HB (2005) Coyote movements and social structure along a cryptic population genetic subdivision. Molecular Ecology, 14, 1241-1249. Schweizer RM, Roemer GW, Pollinger JP, Wayne RK (2009) Characterization of 15 tetranucleotide microsatellite markers of ringtails (Bassariscus astutus). Molecular Ecology Resources, 9, 210-212. Sever M, Lajovic J, Rajer B (2005) Robustness of the Fishers discriminant function to skew-curved normal distribution. Metodoloki zvezki, 2 (2), 231-242. Stefenon VM, Steiner N, Guerra MP, Nodari RO (2009) Integrating approaches towards the conservation of forest genetic resources: a case study of Araucaria angustifolia. Biodiversity Conservation, 18, 2433-2448. Svanbck R, Bolnick DI (2007) Intraspecific competition drives increased resource use diversity within a natural population. Proceedings of the Royal Society B, 274, 839-844. Tabachnick BG, Fidell LS (1996) Using Multivariate Statistics, 3rd edn. Harper Collins College Publishers, New York. Taylor WP (1954) Food habits and notes on life history of the ring-tailed cat in Texas. Journal of Mammalogy, 35, 55-63. Tinker MT, Costa DP, Estes JA et al. (2007) Individual dietary specialization and dive behaviour in the California sea otter: Using archival time-depth data to detect alternative foraging strategies. Deep-Sea Research Part II-Topical Studies in Oceanography, 54, 330-342. Toweill DE (1976) Movements of ringtails in Texas Edwards Plateau Region. M.S. Thesis, Texas A&M University.
44

Toweill DE, Teer JG (1977) Food habits of ringtails in the Edwards Plateau region of Texas. Journal of Mammalogy, 58, 660-663. Toweill DE, Teer JG (1981) Home range and den habits of Texas ringtails (Bassariscus astutus flavus). pp. 1103-1120 in Proceedings of the Worldwide Furbearer Conferences. Trapp GR (1972) Some anatomical and behavioral adaptations of ringtails, Bassariscus astutus. Journal of Mammalogy, 53, 549-557. Trapp GR (1978) Comparative behavioral ecology of the ringtail and gray fox in southwestern Utah. Carnivore, 1, 3-31. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 36, 1358-1370. Yarchin, JC (1990) Home range use by ringtails in a southwestern riparian area. pp. 156-164 in Managing wildlife in the Southwest: Proceedings of the symposium.

45

APPENDIX A R Programming Language Code for Discriminant Analyses, Testing for Violations of Model Assumptions and Randomization Tests #Import data to R Programming Interface data.path <-"~/Carnivore_Research/Carnivore_Data/Analysis/R_DATABASES/" data.file<-"DFA_data" dfa.data <- read.table(file=paste(data.path, data.file, ".txt", sep=""), header=TRUE, sep="\t", na.strings=c(NA)) #Convert population assignments to factors; habitat types to numeric dfa.data$PopAssigment<-factor(dfa.data$PopAssigment) #Convert to factors levels(dfa.data$PopAssigment)<-c(1:6) #Reassign values 1-6 levels(dfa.data$HABITAT)<-c(1:6) #Convert habitat levels to dfa.data$HABITAT<-as.numeric(dfa.data$HABITAT) #numeric values 1-6 #Rescale variables; Store population assignments as a vector sdfa.data<-scale(dfa.data[,2:20], scale=TRUE) #Center variables around means # and scale by SD sdfa.data<-as.data.frame(sdfa.data) #Convert to data frame PopAssigment<-dfa.data$PopAssigment #Store Original Pop. assignments sdfa.data<-cbind(PopAssigment, sdfa.data) #Store PopAssigment back to df #Determine and store size of each cluster sdfa.25.data<-sdfa.data l1<-length(sdfa.25.data[PopAssigment==1,1]) l2<- length(sdfa.25.data[PopAssigment==2,1]) l3<- length(sdfa.25.data[PopAssigment==3,1]) l4<- length(sdfa.25.data[PopAssigment==4,1]) l5<- length(sdfa.25.data[PopAssigment==5,1]) l6<- length(sdfa.25.data[PopAssigment==6,1]) c.reps<-c(l1,l2,l3,l4,l5,l6)

#DF to store scaled data #Store length of cluster 1 #Store length of cluster 2 #Store length of cluster 3 #Store length of cluster 4 #Store length of cluster 5 #Store length of cluster 6 #Create vector of reps

#Replace clusters 1 and 6 with NA and save, Remove clusters 1 and 6 sdfa.25.data$PopAssigment<-rep(c(NA, "2", "3", "4", "5", NA), c.reps) sdfa.25.data$PopAssigment<-as.factor(sdfa.25.data$PopAssigment) sdfa.25.data<-sdfa.25.data[c(-1, -2, -3, -4, -5, -6),] sdfa.25.data<-sdfa.25.data[c(-144, -145, -146, -147),] PopAssigment<-sdfa.25.data$PopAssigment

46

#Testing homogeneity of covariance matrices library(pysch) x2<-subset(sdfa.25.data, sdfa.25.data[,1]==2) #Select subset of cluster 2 x3<-subset(sdfa.25.data, sdfa.25.data[,1]==3) #Select subset of cluster 3 x4<-subset(sdfa.25.data, sdfa.25.data[,1]==4) #Select subset of cluster 4 x5<-subset(sdfa.25.data, sdfa.25.data[,1]==5) #Select subset of cluster 5 n.vars<-11 #Store number of Variables x2[,1]<-NULL; x3[,1]<-NULL; x4[,1]<-NULL; x5[,1]<-NULL #Remove Pop. assignments #Store each data frame as a matrix x2<-as.matrix(x2[,1:n.vars]); x3<-as.matrix(x3[,1:n.vars]); x4<-as.matrix(x4[,1:n.vars]); x5<-as.matrix(x5[,1:n.vars]); #Save covariance and correlation matrices for each cluster s2<-cov(x2); s3<-cov(x3); s4<-cov(x4); s5<-cov(x5); c2<-cor(x2); c3<-cor(x3); c4<-cor(x4); c5<-cor(x5); n2<-nrow(x2); n3<-nrow(x3); n4<-nrow(x4); n5<-nrow(x5); n.rows<-c(n2,n3,n4,n5) #Conduct Boxs chi-squared test for equality of covariance matrices cortest(c2,c3, n2, n3); cortest(c2,c4, n2, n4); cortest(c2,c5, n2, n5) cortest(c3,c4, n3, n4); cortest(c3,c5, n3, n5); cortest(c4,c5, n4, n5) #Test for outliers with Mahalanobis Distance-Cluster 2 library(ICS) mean(x2) #Calc. vector of means for variables in cluster 2 mahalanobis(x2, mean(x2), s2) #Calculate mahalanobis D2 based on sample #mean and cov matrix mahal.x2<-sqrt(mahalanobis(x2, mean(x2), s2)) #Store Mahalanobis D for obs covmve.x2<-cov.rob(x2) #Compute multivariate location and scale with #high breakdown pt mahal2.x2<-sqrt(mahalanobis(x2,covmve.x2$center, covmve.x2$cov)) #Calc. Mahalanobis D based on minimum #volume ellipsoid (MVE) max.mahal.x2<-max(c(mahal.x2,mahal2.x2)) #Store the max of the 2 Mahalanobis distances out.id.x2<-ifelse(mahal2.x2<=sqrt(qchisq(0.975,11)),0,1) #Ouliers=1, Not an outlier=0 #Plot outliers Identified in Group 2-Plot distances against the obs number par(mfrow=c(1,2), las=1)

47

plot(mahal.x2, xlab="index", ylab="Mahalanobis Distance", ylim=c(0,max.mahal.x2), col=out.id.x2+1, pch=15*out.id.x2+1) abline(h=sqrt(qchisq(0.975,11))) plot(mahal2.x2, xlab="index", ylab="Robuse Mahalanobis Distance", ylim=c(0,max.mahal.x2), col=out.id.x2+1, pch=15*out.id.x2+1) abline(h=sqrt(qchisq(0.975,11))) par(mfrow=c(1,1)) #Test for outliers with Mahalanobis Distance-Cluster 3 mean(x3) #Calc. vector of means for variable in cluster 3 mahalanobis(x3, mean(x3), s3) #Calculate mahalanobis D2 based on sample #mean and cov matrix mahal.x3<-sqrt(mahalanobis(x3, mean(x3), s3)) #Store the Mahalanobis D of obs covmve.x3<-cov.rob(x3) #Compute multivariate location and scale with #high breakdown pt mahal2.x3<-sqrt(mahalanobis(x3,covmve.x3$center, covmve.x3$cov)) #Calc. Mahalanobis D based on minimum #volume ellipsoid (MVE) max.mahal.x3<-max(c(mahal.x3,mahal2.x3)) #Store the max of the 2 Mahalanobis distances out.id.x3<-ifelse(mahal2.x3<=sqrt(qchisq(0.975,11)),0,1) #Ouliers=1, Not an outlier=0 #Plot outliers Identified in Group 3-Plot distances against the obs number par(mfrow=c(1,2), las=1) plot(mahal.x3, xlab="index", ylab="Mahalanobis Distance", ylim=c(0,max.mahal.x3), col=out.id.x3+1, pch=15*out.id.x3+1) abline(h=sqrt(qchisq(0.975,11))) plot(mahal2.x3, xlab="index", ylab="Robuse Mahalanobis Distance", ylim=c(0,max.mahal.x3), col=out.id.x3+1, pch=15*out.id.x3+1) abline(h=sqrt(qchisq(0.975,11))) par(mfrow=c(1,1)) #Test for outliers with Mahalanobis Distance-Cluster 4 mean(x4) #Calc. vector of means for variables in cluster 4 mahalanobis(x4, mean(x4), s4) #Calculate mahalanobis D2 based on sample #mean and cov matrix mahal.x4<-sqrt(mahalanobis(x4, mean(x4), s4)) #Store the Mahalanobis D of obs covmve.x4<-cov.rob(x4) #Compute multivariate location and scale with #high breakdown pt mahal2.x4<-sqrt(mahalanobis(x4,covmve.x4$center, covmve.x4$cov))
48

#Calc. Mahalanobis D based on minimum #volume ellipsoid (MVE) max.mahal.x4<-max(c(mahal.x4,mahal2.x4)) #Store the max of the 2 Mahalanobis distances out.id.x4<-ifelse(mahal2.x4<=sqrt(qchisq(0.975,11)),0,1) #Ouliers=1, Not an outlier=0 #Plot outliers Identified in Group 4-Plot distances against the obs number par(mfrow=c(1,2), las=1) plot(mahal.x4, xlab="index", ylab="Mahalanobis Distance", ylim=c(0,max.mahal.x4), col=out.id.x4+1, pch=15*out.id.x4+1) abline(h=sqrt(qchisq(0.975,11))) plot(mahal2.x4, xlab="index", ylab="Robuse Mahalanobis Distance", ylim=c(0,max.mahal.x4), col=out.id.x4+1, pch=15*out.id.x4+1) abline(h=sqrt(qchisq(0.975,11))) par(mfrow=c(1,1)) #Test for outliers with Mahalanobis Distance-Cluster 5 mean(x5) #Calc. vector of means for variables in cluster 5 mahalanobis(x5, mean(x5), s5) #Calculate mahalanobis D2 based on sample #mean and cov matrix mahal.x5<-sqrt(mahalanobis(x5, mean(x5), s5)) #Store the Mahalanobis D of obs covmve.x5<-cov.rob(x5) #Compute multivariate location and scale with #high breakdown pt mahal2.x5<-sqrt(mahalanobis(x5,covmve.x5$center, covmve.x5$cov)) #Calc. Mahalanobis D based on minimum #volume ellipsoid (MVE) max.mahal.x5<-max(c(mahal.x5,mahal2.x5)) #Store the max of the 2 Mahalanobis distances out.id.x5<-ifelse(mahal2.x5<=sqrt(qchisq(0.975,11)),0,1) #No outliers identified-Group 5 #Remove outliers and store as a new dataframe outliers.x<-c(out.id.x2,out.id.x3,out.id.x4, out.id.x5) #Store outlier IDs into single vector sdfa.25no.data<-cbind(outliers.x,sdfa.25.data) #Add vector indicating outliers to df o<-NULL #Create dummy variable os<-NULL #Vector to store IDs of outliers for (i in 1:length(sdfa.25no.data$outliers.x)){ #Loop over all observations if (sdfa.25no.data[i,1]==1){ #If obs is classified as an outlier, record obs. id os<-append(os, i)} #Store obs id to vector
49

else (o<-i)

#Otherwise do nothing

} sdfa.25no.data<-sdfa.25no.data[-os,] #Remove obs classified as outliers sdfa.25no.data$outliers.x<-NULL #Remove column indicating outliers #Test homogeneity of covariance matrices following the removal of outliers library(pysch) x2<-subset(sdfa.25no.data, sdfa.25no.data[,1]==2) #Select subset of cluster 2 x3<-subset(sdfa.25no.data, sdfa.25no.data[,1]==3) #Select subset of cluster 3 x4<-subset(sdfa.25no.data, sdfa.25no.data[,1]==4) #Select subset of cluster 4 x5<-subset(sdfa.25no.data, sdfa.25no.data[,1]==5) #Select subset of cluster 5 n.vars<-11 x2[,1]<-NULL; x3[,1]<-NULL; x4[,1]<-NULL; x5[,1]<-NULL #Remove Pop. assignments #Store each data frame as a matrix x2<-as.matrix(x2[,1:n.vars]); x3<-as.matrix(x3[,1:n.vars]); x4<-as.matrix(x4[,1:n.vars]); x5<-as.matrix(x5[,1:n.vars]); #Save covariance and correlation matrices for each cluster s2<-cov(x2); s3<-cov(x3); s4<-cov(x4); s5<-cov(x5) c2<-cor(x2); c3<-cor(x3); c4<-cor(x4); c5<-cor(x5) n2<-nrow(x2); n3<-nrow(x3); n4<-nrow(x4); n5<-nrow(x5); n.rows<-c(n2,n3,n4,n5) #Conduct Boxs chi-squared test for equality of covariance matrices cortest(c2,c3, n2, n3); cortest(c2,c4, n2, n4); cortest(c2,c5, n2, n5) cortest(c3,c4, n3, n4); cortest(c3,c5, n3, n5);cortest(c4,c5, n4, n5) #Test for normality by assessing levels of skewness for pops 2:5 without outliers library(moments) sks<c("HABITAT","SIMPSONS_SDI","PER_SLOPE","ASPECT","ELEVATION", "PER_ROC","PER_TR", "PER_SH", "PER_SS", "PER_GR", "PER_FO") sks<-data.frame(sks) #Create df to store skewness sk<-NULL #Create vector for skewness for(i in 1:11){ #Loop over all variables x<-skewness(x2[,i+1],na.rm=TRUE) #Measure skewness-variable i+1 sk<-append(sk,x)} #Store skewness to vector sk.x2<-sk #Save skewness of group sk<-NULL #Reset sk vector for(i in 1:11){ #Loop over all variables x<-skewness(x3[,i+1],na.rm=TRUE) #Measure skewness-variable i+1
50

sk<-append(sk,x)} sk.x3<-sk sk<-NULL

#Store skewness to vector #Save skewness of group #Reset sk vector

for(i in 1:11){ #Loop over all variables x<-skewness(x4[,i+1],na.rm=TRUE) #Measure skewness-variable i+1 sk<-append(sk,x)} #Store skewness to vector sk.x4<-sk #Save skewness of group sk<-NULL #Reset sk vector for(i in 1:11){ #Loop over all variables x<-skewness(x5[,i+1],na.rm=TRUE) #Measure skewness-variable i+1 sk<-append(sk,x)} #Store skewness to vector sk.x5<-sk #Save skewness of group sk<-NULL #Reset sk vector sks<-cbind(sks,sk.x2,sk.x3,sk.x4,sk.x5) #Save all skewness values to df #Compare skewness to normally distribution ses.x2<-sqrt(6/length(x2[,1]))*2; ses.x3<-sqrt(6/length(x3[,1]))*2 ses.x4<-sqrt(6/length(x4[,1]))*2; ses.x5<-sqrt(6/length(x5[,1]))*2 ses<-c(ses.x2,ses.x3,ses.x4,ses.x5) not.normal<-0; normal<-0 #Initialize vectors for(i in 1:4){ #Loop over all clusters sest<-ses[i] for(n in 1:11){ #Loop over all variables if (abs(sks[n,i+1])>sest) not.normal<-not.normal+1 else normal<-normal+1}} #Store if skewness departs from normality #Test for normality by assessing levels of kurtosis for pops 2:5 without outliers kts<-c("HABITAT","SIMPSONS_SDI","PER_SLOPE","ASPECT","ELEVATION", "PER_ROC","PER_TR", "PER_SH", "PER_SS", "PER_GR", "PER_FO") kts<-data.frame(kts) #Create df to store kurtosis kt<-NULL #Create vector to store skewness for(i in 1:11){ #Loop over all variables x<-skewness(x2[,i+1],na.rm=TRUE) #Measure kurtosis-variable i+1 kt<-append(kt,x)} #Store kurtosis to vector kt.x2<-kt #Save skewness of group kt<-NULL #Reset kt vector for(i in 1:11){ #Loop over all variables
51

x<-skewness(x3[,i+1],na.rm=TRUE) #Measure kurtosis-variable i+1 kt<-append(kt,x)} #Store kurtosis to vector kt.x3<-kt #Save skewness of group kt<-NULL #Reset kt vector for(i in 1:11){ #Loop over all variables x<-skewness(x4[,i+1],na.rm=TRUE) #Measure kurtosis-variable i+1 kt<-append(kt,x)} #Store kurtosis to vector kt.x4<-kt #Save skewness of group kt<-NULL #Reset kt vector for(i in 1:11){ #Loop over all variables x<-skewness(x5[,i+1],na.rm=TRUE) #Measure kurtosis-variable i+1 kt<-append(kt,x)} #Store kurtosis to vector kt.x5<-kt #Save skewness of group kt<-NULL #Reset kt vector kts<-cbind(kts,kt.x2,kt.x3,kt.x4,kt.x5) #Save all kurtosis values to df #Compare skewness to normally distribution sek.x2<-sqrt(24/length(x2[,1]))*2; sek.x3<-sqrt(24/length(x3[,1]))*2 sek.x4<-sqrt(24/length(x4[,1]))*2; sek.x5<-sqrt(24/length(x5[,1]))*2 sek<-c(sek.x2,sek.x3,sek.x4,sek.x5) not.normal<-0; normal<-0 #Initialize vectors for(i in 1:4){ #Loop over clusters sekt<-sek[i] for(n in 1:11){ #Loop over variables if (abs(kts[n,i+1])>sekt) not.normal<-not.normal+1 else normal<-normal+1}} #Store if kurtosis departs from normality #LDA model with 11 predictor variables and 4 a priori Genetic clusters from #GENELAND without outliers (LDA1) library(MASS) fit.no <- lda(PopAssigment~ HABITAT + SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=sdfa.25no.data, na.action="na.omit") predict <- predict(fit.no) #Predict assignment of obs, store class <- predict$class #Store predicted clusters as class posterior <- predict$posterior #Store post. prob. to assign obs. ld <- predict$x #Store new data frame of vectors n <- length(sdfa.25no.data$PopAssigment) #n = 118, the number of obs. correct.class <- length(which(class == sdfa.25no.data$PopAssigment))
52

#Determine which obs were classified to their #observed cluster and store total number mc.rate.lda <- (1 - correct.class/n)*100 #Calculate APER cc.rate.lda <- (correct.class/n)*100 #Calculate percentage Correct mc.rate.lda #View APER cc.rate.lda #View percentage correct scaling.lda<-fit.no$scaling # Scaling matrix (coefficients)

#Conduct formal Wilks Lamda test X<-as.matrix(sdfa.25no.data[,2:12]) #Remove PopAssigment and store as matrix cluster.manova<-manova(X~sdfa.25no.data$PopAssigment) #Conduct MANOVA on X by PopAssigment cluster.wilks<-summary(cluster.manova, test="Wilks") #Conduct Wilks test on clusters cluster.wilks #View results of formal test #Conduct randomization tests of LDA1-Randomly select groups in proportionally to #the observed genetic clusters and calculate APER and Wilks Lambda ra.25no.data<-sdfa.25no.data #New df for Random Assign mc.rate.ra.lda<-NULL #Initialize vector to store APER wilks.ra.lda1<-NULL #Initialize vector to store Wilks Lambda values RA<-ra.25no.data[,1] #Store Genetically clustered assignments ra.25no.data<-cbind(ra.25no.data,RA) #Bind a RA vector to df to #randomize in each rep n<-length(ra.25no.data[,1]) #Store length of data frame reps<-100000 #Set number of reps to loop over for(i in 1:reps){ #Loop for number of reps ra.25no.data$RA<- sample(ra.25no.data$RA,n,replace=FALSE) #Randomly sample individuals w/o #replacement, store new assignment fit.nora <- lda(RA~ HABITAT + SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=ra.25no.data, na.action="na.omit") #Conduct a LDA based on the randomization predict <- predict(fit.nora) #Predict assignment of obs, store class <- predict$class #Store predicted clusters as class posterior <- predict$posterior #Store post. prob. to assign obs. ld <- predict$x #Store new df of vectors correct.class <- length(which(class == ra.25no.data$RA)) #Determine which obs were classified to their #observed cluster and store total number
53

mc <- (1 - correct.class/n)*100 #Calculate APER for loop mc.rate.ra.lda<-append(mc.rate.ra.lda, mc) #Store APER of loop to vector X.ra.lda1<-as.matrix(ra.25no.data[,2:12]) #Remove PopAssigment, store cluster.manova<-manova(X.ra.lda1~ra.25no.data$RA) #Conduct MANOVA on X by PopAssigment cluster.wilks<-summary(cluster.manova, test="Wilks") #Conduct Wilks lambda test on clusters wilks.ra.lda1<-append(wilks.ra.lda1, cluster.wilks$stats[3]) #Store only Wilks Lambda value for loop ra.25no.data$RA<- RA #Reset RA column to original ordering } mean(mc.rate.ra.lda) range(mc.rate.ra.lda) mean(wilks.ra.lda1) range(wilks.ra.lda1) #Calculate and view mean APER for #100K loops #Calculate and view range of APER #for 100K loops #Calculate and view mean Wilks Lambda #for 100K loops #Calculate and view range Wilks Lambda #for 100K loops

#Plot the distribution of APERs for randomization tests of LDA1 hist(mc.rate.ra.lda, freq=FALSE, xlab="", xlim=c(0,100), ylab="", ylim=c(0,0.15), main="(a) Distribution of LDA APERs for 100k Randomizations") #Create density histogram of APERs density.mcr<-density(mc.rate.ra) #store density info from mc.rate.ra xfit<-seq((0),(100), length=180) yfit<-dnorm(xfit,mean=mean(mc.rate.ra.lda), sd=sd(mc.rate.ra.lda)) lines(xfit,yfit, col="red", lwd=2) #Overlay Normal Curve lines(c(mc.rate.lda,mc.rate.lda),c(0,0.1)) text (mc.rate.lda-15, y=.125, labels ="APER of Genetic Clusters", pos=4) #QDA model with 10 predictor variables and 4 a priori Genetic clusters from #GENELAND without outliers library(MASS) fit.qno <- qda(PopAssigment~ SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=sdfa.25no.data, na.action="na.omit") predict <- predict(fit.qno , method= "predictive") #Predict assignment of obs and store class <- predict$class #Store predicted clusters posterior <- predict$posterior #Store post. prob used to assign obs to classes
54

n <- length(sdfa.25no.data$PopAssigment) #n = 118, the number of obs correct.class <- length(which(class == sdfa.25no.data$PopAssigment)) #Determine which obs were classified to their #observed cluster and store total number mc.rate.qda <- (1 - correct.class/n)*100 #Calculate APER cc.rate.qda <- (correct.class/n)*100 #Calculate percentage correct mc.rate.qda #View APER cc.rate.qda #View percentage correct #Conduct randomization tests of QDA-Randomly select groups in proportionally to #the observed genetic clusters and calculate APER ra.25no.data<-sdfa.25no.data #Create new df for Random Assign mc.rate.ra.qda<-NULL #Initialize empty vector to store APERs RA<-ra.25no.data[,1] #Store Genetically clustered assignments ra.25no.data<-cbind(ra.25no.data,RA) #Bind RA vector as column to #df to randomize in each rep n<-length(ra.25no.data[,1]) #n=118, store length of data frame reps<-100000 #Set number of reps to loop over for(i in 1:reps){ #Loop over number of reps ra.25no.data$RA<- sample(ra.25no.data$RA,n,replace=FALSE) #Randomly sample w/o replacement each #individual and store new assignment fit.qno.ra <- qda(RA~ SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=ra.25no.data, na.action="na.omit") #Conduct QDA on the random assignment predict <- predict(fit.qno.ra, method= "predictive") #Predict assignment of observations and store class <- predict$class #Store predicted clusters posterior <- predict$posterior #Store post. prob used to assign obs to classes correct.class <- length(which(class == ra.25no.data$RA)) #Determine which obs were classified to their #observed cluster and store total number mc <- (1 - correct.class/n)*100 #Store APER for current loop mc.rate.ra.qda<-append(mc.rate.ra.qda, mc) #Store APER for loop into a vector ra.25no.data$RA<- RA #Reset RA column to its original ordering } mean(mc.rate.ra.qda) #Calculate and view mean APER range(mc.rate.ra.qda) #Calculate and view range of APER

55

#Plot the distribution of APERs for randomization tests of QDA hist(mc.rate.ra.qda, freq=FALSE, xlab=" ", ylab="", xlim=c(0,100), ylim=c(0,0.15) , main="(b) Distribution of QDA APERs for 100k Randomizations ") #Create density histogram of APERs density.mcr<-density(mc.rate.ra.qda) #Store density info from mc.rate.ra xfit<-seq((0),(100), length=180) yfit<-dnorm(xfit,mean=mean(mc.rate.ra.qda), sd=sd(mc.rate.ra.qda)) lines(xfit,yfit, col="red", lwd=2) #Overlay Normal Curve lines(c(mc.rate.qda,mc.rate.qda),c(0,0.1)) text (mc.rate.qda-15, y=.125, labels ="APER of Genetic Clusters", pos=4) #Create training and validation data sets and test predictive power of LDA using #cross-validation x2.r<-cbind(c(1:length(x2[,1])),x2) #Add a column of 1:length of cluster to x2 df x3.r<-cbind(c(1:length(x3[,1])),x3) #Add a column of 1:length of cluster to x3 df x4.r<-cbind(c(1:length(x4[,1])),x4) #Add a column of 1:length of cluster to x4 df x5.r<-cbind(c(1:length(x5[,1])),x5) #Add a column of 1:length of cluster to x5 df #Randomly sample ~80% of each cluster, store id, and save data as training data set x2.id<-sample(x2.r[,1],47,replace=FALSE); x2.t<-x2[c(x2.id),] x3.id<-sample(x3.r[,1],15,replace=FALSE); x3.t<-x3[c(x3.id),] x4.id<-sample(x4.r[,1],22,replace=FALSE); x4.t<-x4[c(x4.id),] x5.id<-sample(x5.r[,1],10,replace=FALSE); x5.t<-x5[c(x5.id),] sdfa.25no.training<-rbind(x2.t, x3.t, x4.t, x5.t) #Extract only those obs that were not randomly selected and save as validation data x2.idv<- c(1:length(x2[,1]))[-c(x2.id)]; x2.v<-x2[c(x2.idv),] x3.idv<- c(1:length(x3[,1]))[-c(x3.id)]; x3.v<-x3[c(x3.idv),] x4.idv<- c(1:length(x4[,1]))[-c(x4.id)]; x4.v<-x4[c(x4.idv),] x5.idv<- c(1:length(x5[,1]))[-c(x5.id)]; x5.v<-x5[c(x5.idv),] sdfa.25no.validation<-rbind(x2.v, x3.v, x4.v, x5.v) #Conduct LDA on training dataset to develop predictive model fit.not <- lda(PopAssigment~ HABITAT +SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=sdfa.25no.training, na.action="na.omit") predict <- predict(fit.not) #Predict assignment of observations and store class <- predict$class #Store predicted clusters as class posterior <- predict$posterior #Store post. prob. used to assign obs to classes n <- length(sdfa.25no.training$PopAssigment) #n = 94, the number of obs correct.class <- length(which(class ==sdfa.25no.training$PopAssigment)) #Determine which obs were classified to their #observed cluster and store total number
56

mc.rate.training.lda <- (1 - correct.class/n)*100 #Calculate APER cc.rate.training.lda <- (correct.class/n)*100 #Calculate percentage correct mc.rate.training.lda #View APER cc.rate.training.lda #View percentage correct #Cross-validation- use prediction LDA from training, calculate APER for validation table(actual=sdfa.25no.validation$PopAssigment,predicted=predict(fit.not, newdata=sdfa.25no.validation)$class) #APER of Validation #Create training and validation data sets and test predictive power of QDA using #cross-validation Randomly sample ~80% of each cluster, store id, and save x2.id2<-sample(x2.r[,1],47,replace=FALSE); x2.t2<-x2[c(x2.id2),] x3.id2<-sample(x3.r[,1],15,replace=FALSE); x3.t2<-x3[c(x3.id2),] x4.id2<-sample(x4.r[,1],22,replace=FALSE); x4.t2<-x4[c(x4.id2),] x5.id2<-sample(x5.r[,1],11,replace=FALSE); x5.t2<-x5[c(x5.id2),] sdfa.25no.training2<-rbind(x2.t2, x3.t2, x4.t2, x5.t2) #Extract only those obs that were not randomly selected and save as validation data x2.idv2<- c(1:length(x2[,1]))[-c(x2.id2)]; x2.v2<-x2[c(x2.idv2),] x3.idv2<- c(1:length(x3[,1]))[-c(x3.id2)]; x3.v2<-x3[c(x3.idv2),] x4.idv2<- c(1:length(x4[,1]))[-c(x4.id2)]; x4.v2<-x4[c(x4.idv2),] x5.idv2<- c(1:length(x5[,1]))[-c(x5.id2)]; x5.v2<-x5[c(x5.idv2),] sdfa.25no.validation2<-rbind(x2.v2, x3.v2, x4.v2, x5.v2) #Conduct QDA on training dataset to develop predictive model fit.qt<- qda(PopAssigment~ SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=sdfa.25no.training2, na.action="na.omit") predict <- predict(fit.qt, method= "predictive") #Predict assignment of obs, store class <- predict$class #Store predicted clusters as class posterior <- predict$posterior #Store post. prob. used to assign obs to classes n <- length(sdfa.25no.training2$PopAssigment) #n = 95, the number of obs correct.class <- length(which(class == sdfa.25no.training2$PopAssigment)) mc.rate.training.qda2 <- (1 - correct.class/n)*100 #Calculate APER cc.rate.training.qda2 <- (correct.class/n)*100 #Calculate percentage correct mc.rate.training.qda2 #View APER cc.rate.training.qda2 #View percentage correct #Cross-validation- use prediction QDA from training, calculate APER for validation table(actual=sdfa.25no.validation2$PopAssigment,predicted=predict(fit.qt, newdata=sdfa.25no.validation2)$class) #APER of Validation

57

#Select a random sample of cluster 2 equal in size to the mean of the other 3 clusters x2.rt<-cbind(c(1:length(x2[,1])),x2) #Bind a column to the cluster 2 data frame x2.idt<-sample(x2.rt[,1], (sum(n.rows[2:4])/3),replace=FALSE) #Sample 20 of the rows, store column number x2.20t<-x2[c(x2.idt),] #Extract only those rows, save as new df x2.20t[,1]<-NULL #Remove first column used to select sample x2.20t<-as.matrix(x2.20t[,1:n.vars]) #Store as matrix s2.t<-cov(x2.20t) #Create and store covariance matrix c2.t<-cor(x2.20t) #Create and store correlation matrix #Testing homogeneity of covariance matrices with random sample of cluster 2 library(pysch) cortest(c2.t,c3, nrow(x2.20t), n3); cortest(c2.t,c4, nrow(x2.20t), n4) cortest(c2.t,c5, nrow(x2.20t), n5); cortest(c3,c4, n3, n4) cortest(c3,c5, n3, n5); cortest(c4,c5, n4, n5) #LDA model with 11 predictor variables and 4 a priori Genetic clusters from #GENELAND without outliers and random sample of cluster 2 (LDA2) sdfa.35no.data<-sdfa.25no.data[-c(1:58),] #Remove cluster 2 from df, store x2.20t<-as.data.frame(x2.20t) #Store random sample of cluster 2 as df PopAssigment<-rep(2,length(x2.20t[,1])) #Create vector of PopAssigment x2.20t<-cbind(PopAssigment,x2.20t) #Bind column of PopAssigment sdfa.2n5no.data<-rbind(x2.20t, sdfa.35no.data) #Attach sample of cluster 2 to df sdfa.2n5no.data$PopAssigment<-as.factor(sdfa.2n5no.data$PopAssigment) library(MASS) fit.25sno <- lda(PopAssigment~ HABITAT + SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=sdfa.2n5no.data, na.action="na.omit") predict <- predict(fit.25sno) #Predict assignment of observations and store class <- predict$class #Store predicted clusters as class posterior <- predict$posterior #Store post. prob. used to assign obs to classes ld <- predict$x #Store new data frame of vectors from predict n <- length(sdfa.2n5no.data$PopAssigment) #n = 80, the number of obs correct.class <- length(which(class == sdfa.2n5no.data$PopAssigment)) #Determine which obs were classified to their #observed cluster and store total number mc.rate.lda.s2 <- (1 - correct.class/n)*100 #Calculate APER cc.rate.lda.s2 <- (correct.class/n)*100 #Calculate percentage correct mc.rate.lda.s2 #View APER cc.rate.lda.s2 #View percentage correct scaling.lda.s2<-fit.25sno$scaling #Scaling matrix of coefficients
58

#Conduct formal Wilks Lamda test X2<-as.matrix(sdfa.2n5no.data[,2:12]) #Remove PopAssigment and store as X2 cluster.manova2<-manova(X2~sdfa.2n5no.data$PopAssigment) #Conduct MANOVA on X2 by PopAssigment cluster.wilks2<-summary(cluster.manova2, test="Wilks") #Conduct Wilks test on clusters cluster.wilks2 #View results of formal test #Conduct randomization tests of LDA2-Randomly select groups in proportionally to #the observed genetic clusters and calculate APER and Wilks Lambda ra.2n5no.data<- sdfa.2n5no.data #Store data as new df for Random Assign mc.rate.ra.lda.s2<-NULL #Initialize empty vector to store APER wilks.ra.lda2<-NULL #Initialize empty vector to store Wilks Lambda RA<-ra.2n5no.data[,1] #Store observed population assignments ra.2n5no.data<-cbind(ra.2n5no.data,RA) #Bind a RA as column to df to randomize n<-length(ra.2n5no.data[,1]) #n=80, store length of data frame reps<-100000 #Set number of reps to loop over for(i in 1:reps){ #Loop over for number of reps ra.2n5no.data$RA<- sample(ra.2n5no.data$RA,n,replace=FALSE) #Randomly sample w/o replacement each #individual and store new assignment fit.2n5no.ra <- lda(RA~ HABITAT + SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=ra.2n5no.data, na.action="na.omit") #Conduct LDA on the random assignment predict <- predict(fit.2n5no.ra) #Predict assignment of observations and store class <- predict$class #Store predicted clusters as class posterior <- predict$posterior #Store post. prob. used to assign obs to classes correct.class <- length(which(class == ra.2n5no.data$RA)) #Determine which obs were classified to their #observed cluster and store total number mc <- (1 - correct.class/n)*100 #Calculate APER for current loop mc.rate.ra.lda.s2<-append(mc.rate.ra.lda.s2, mc) #Store APER into a vector X.ra.lda2<-as.matrix(ra.2n5no.data[,2:12]) #Remove PopAssigment, store cluster.manova<-manova(X.ra.lda2~ra.2n5no.data$RA) #Conduct MANOVA on matrix by #PopAssigment cluster.wilks<-summary(cluster.manova, test="Wilks") #Conduct Wilks test on clusters wilks.ra.lda2<-append(wilks.ra.lda2, cluster.wilks$stats[3])
59

ra.2n5no.data$RA<- RA } mean(mc.rate.ra.lda.s2) range(mc.rate.ra.lda.s2) mean(wilks.ra.lda2) range(wilks.ra.lda2)

#Store results of formal test for loop to vector #Reset randomizations to original ordering

#Calculate and view mean APER #Calculate and view range of APER #Calculate and view mean Wilks #Calculate and view range of Wilks

#Plot the distribution of APERs for randomization tests of LDA2 hist(mc.rate.ra.lda.s2, freq=FALSE, ylab="DENSITY" , xlab="APER", xlim=c(0,100), ylim=c(0,0.15)) #Create density histogram of APERs density.mcr<-density(mc.rate.ra.lda.s2) #Store density info from mc.rate.ra xfit<-seq((0),(100), length=180) xfit<-seq((0),(100), length=180) yfit<-dnorm(xfit,mean=mean(mc.rate.ra.lda.s2), sd=sd(mc.rate.ra.lda.s2)) lines(xfit,yfit, col="red", lwd=2) #Overlay Normal Curve lines(c(mc.rate.lda.s2,mc.rate.lda.s2),c(0,0.1)) text (mc.rate.lda.s2-15, y=.125, labels ="APER of Genetic Clusters", pos=4) #Graphing of the 2 most influential axes, LD1 and LD2 for LDA2 (stripcharts) PopAssigment<-sdfa.2n5no.data$PopAssigment #Store cluster assignments predict <- predict(fit.25sno) #Store predict ld <- predict$x ld<-cbind(PopAssigment, as.data.frame(ld)) #Add the pop assignment back #to ld and store all as df par(mfrow=c(2,1)) stripchart(ld$LD1~ld$PopAssigment, xlab="LD1", ylab="Clusters") #Stripchart of LD1 by cluster-means noted by X means1<-tapply(ld$LD1, ld$PopAssigment, mean, na.rm=TRUE) points(x=means1, y=1:length(means1), pch="X", col="red") stripchart(ld$LD2~ld$PopAssigment, xlab="LD2", ylab="Clusters") #Stripchart of LD2 by cluster-means noted by X means2<-tapply(ld$LD2, ld$PopAssigment, mean, na.rm=TRUE) points(x=means2, y=1:length(means2), pch="X", col="red")

#Graph individuals against the 2 most influential axes, LD1 and LD2 for LDA2 #(scatter plots centered on 0) par(mfrow=c(1,1))

60

l2<-ld[PopAssigment==2,]; l2<-length(l2[,1]) l3<-ld[PopAssigment==3,]; l3<-length(l3[,1]) l4<-ld[PopAssigment==4,]; l4<-length(l4[,1]) l5<-ld[PopAssigment==5,]; l5<-length(l5[,1])

#Subset cluster 2, store length #Subset cluster 3, store length #Subset cluster 4, store length #Subset cluster 5, store length

c.colors<-c("blue","green","red","orange") #Create vector of colors-1/cluster c.reps<-c(l2,l3, l4, l5) #Create vector-no. of each color c.pch<- c("2","3","4","5") #Create vector-plot characters rm(l2,l3,l4,l5) #remove all lenght objects c.colors<-rep(c.colors, c.reps) #Extend colors to individuals c.pch<-rep(c.pch, c.reps) #Extend symbols to individuals plot(ld$LD2~ld$LD1, xlim=c(-5,5), ylim=c(-5,5),xlab="LD1", ylab="LD2", pch=c.pch, col=c.colors) #Plot individuals by LD1 & LD2 abline(0,0); abline(v=0) #Add lines intersecting at 0

#Create 3D plots of individuals against all 3 LDs for LDA2 #(Both standard and interactive) library(lattice) cloud(ld$LD3~ld$LD1 + ld$LD2, pch=c.pch, col=c.colors, xlab="LD1", ylab="LD2",zlab="LD3") #Standard 3D scatter plot library(Rcmdr) ld$PopAssigment<-factor(ld$PopAssigment) #PopAssigment to a factor scatter3d(ld$LD3,ld$LD1,ld$LD2, xlab="LD3", ylab="LD1", zlab="LD2", axis.scales=FALSE, axis.col=rep(c("black"),3), point.col=c.colors, surface=FALSE, groups=ld$PopAssigment, ellipsoid=TRUE, sphere.size=0.75, revolutions=2) #Create interactive 3D plot with ellipsoids #All of the graphing code can be used on LDA1 by changing the lda object used #Calculation of APER using a nonparametric test for comparison with parametric #techniques-Code for Classification tree models (Binary Recursive Classification) library(rpart) tree.25no <- rpart(PopAssigment~ SIMPSONS_SDI + PER_SLOPE + ASPECT + ELEVATION + PER_ROC + PER_TR + PER_SH + PER_SS + PER_GR + PER_FO, data=sdfa.25no, na.action="na.omit") predict<-predict(tree.25no) table(actual=sdfa.25no.data$PopAssigment, predicted=predict(tree.25no, newdata=sdfa.25no.data, type="class"))#Calculate and view mean APER

61

You might also like