You are on page 1of 50

PROC QTL

A SAS Procedure for Mapping Quantitative Trait Loci

Version 2.0

The correct bibliographic citation for this program is Zhiqiu Hu and Shizhong Xu (2009). PROC QTL - A SAS Procedure for Mapping Quantitative Trait Loci. International Journal of Plant Genomics 2009: 3 doi:10.1155/2009/141234.

PROC QTL Version 2.0 Copyright 2008, University of California, Riverside, CA, USA All rights reserved. University of California, Riverside 900 University Ave., Riverside, CA 92521

Contents
OVERVIEW: QTL PROCEDURE.......................................................................................... 1 GETTING STARTED: QTL PROCEDURE ................................................................................ 2 SYNTAX: QTL PROCEDURE .............................................................................................. 6
PROC QTL Statement....................................................................................................... 6 BY Statement ................................................................................................................. 16 CLASS Statement ........................................................................................................... 16 ESTIMATE Statement .................................................................................................... 17 GENOTYPE Statement ................................................................................................... 18 MARKER Statement ....................................................................................................... 20 MATINGTYPE Statement ............................................................................................... 20 MODEL Statement ......................................................................................................... 20 RANGE Statement ......................................................................................................... 21 WEIGHT statement ........................................................................................................ 22

DETAILS: QTL PROCEDURE ........................................................................................... 22 EXAMPLES: QTL PROCEDURE ........................................................................................ 23


Example 1: QTL mapping for continuous trait .............................................................. 23 Example 2: QTL mapping for discrete traits .................................................................. 27 Example 3: QTL mapping in a four-way cross design .................................................... 29 Example 4: QTL mapping via the Bayesian method ...................................................... 32 Example 5: Estimating genomewide epistatic effects via the empirical Bayesian method .......................................................................................................................... 33 Example 6: Joint mapping of QTL for multiple traits .................................................... 35 Example 7: Estimating genomewide QTL effects for discrete traits that follow some special distributions ...................................................................................................... 39 Example 8: Composite interval mapping using PROC QTL............................................ 43 Example 9: Permutation for the Bayesian shrinkage analysis ...................................... 45

REFERENCES:.............................................................................................................. 47

Overview: QTL procedure


PROC QTL is a user defined SAS procedure for mapping quantitative trait loci (QTL). The program was coded in C++ and the interface with the SAS system was conducted using the SAS/Toolkit software (SAS INSTITUTE INC 1991). Since this procedure is not a built-in SAS procedure, users need to obtain a copy of the executable file of PROC QTL and install the software in their personal computers before PROC QTL can be executed. Of course, users need a regular SAS license prior to the installation of PROC QTL. Once PROC QTL is installed, users can call the procedure just like they call any other regular SAS procedures without noticing the differences between this customized procedure and other built-in SAS procedures. PROC QTL is different from other stand alone QTL mapping software packages, such as QTL Cartographer (W ANG et al. 2007), in that the program must be executed within the SAS system to perform all the QTL analysis. It behaves like a parasite to the SAS system except that it presents no harm to the SAS system and the computers that run the program. The SAS system provides services to the procedure such as statement processing, data set management and memory allocation. PROC QTL can read SAS data sets and data views, perform data analysis, print results, and create other SAS data sets. There are many advantages to perform QTL mapping under SAS rather than using stand-alone programs:

Familiarity using PROC QTL is easy for SAS users because they already understood data input, data manipulation, and general SAS syntax. Convenience a program incorporated into the SAS system allows you to put all your programming tools in one place. Integration the data used by RPOC QTL can easily be sorted, printed, and analyzed using other SAS procedures during a single job. Special Capabilities special features, such as BY-group processing and Weight variable handling, can be used. Reduced documentation only the new language statements, the output of the procedure, and any special calculations in the procedure need to be explained.

Getting started: QTL procedure


QTL mapping usually needs the following information: the phenotypic values of a quantitative trait of interest, the genotypes of molecular markers and the linkage map of the markers. Users need to create two SAS datasets, the primary dataset and the map dataset. The primary dataset should contain the phenotypic values, the marker genotypes and all other variables relevant to the QTL mapping. The map dataset contains only three variables, the marker name, the position of marker and the chromosome. Following is an example showing how to create the primary SAS dataset for a BC (backcross) population. The first variable y is the phenotypic value and M1M10 are the genotypes for ten markers. In this example, A and B indicate the two genotypes per locus and U indicates missing genotype.
21.01 20.47 18.54 20.46 21.05 21.87 19.57 21.12 18.46 19.09 19.68 17.22 20.68 20.61 21.74 20.08 21.40 18.80 18.89 21.26 18.56 20.47 21.38 21.23 19.42 18.71 18.13 18.59 23.43 20.02 16.68 18.26 19.68 18.18 21.74 18.02 17.67 A B B B B A B B B B A A U B A B A B A U A B B A B B B A A A A A A B U A B A B B B B A B B B B A A B B A B A B A B B B A A B B B A A A A A A B A U B A B B B B A B B B B A A B U A B U B A B B B A A B B B A A A A A A B A A B A B B U A A B B B B A A B A A B A B A B B B A A B B B B A A A A U B A A B A B B B A A B B B B A B U A A B B B A B B U A A B B B B A A A A A B A A B A U B B A A B B B A B B B A U B B B A B B B A A B B B B A A A B A B A B B A B B B U U A B B B B B B A A B B B A A U B A A B B B B A U U B A B A B B A B U B B A B U B B A B B A A B B B A A B B B A A B B B A A A A A B A B U A B B U B A U U B B A B B A A B B B A A U B B A A B B B A A A A A B A B B A B B A B A B U B B A B A B A B A B A A B B B A A U B B A B A A A B A B B

/*

Program 2-1

*/

data one input y (m1-m10)($); cards; 19.87 U B B A A A A A 17.74 B B A A A A A A 21.74 A A A A A A A A 18.76 B A A U A B B B 20.39 A A A A A A A A 21.75 B B A A A U A A 18.44 A B B B B B B B 19.77 A A A A A A A B 22.37 A A A A A A A A 16.89 A A A B B B B B 20.06 A A A A A A A A 20.53 A U B B B B B B 23.04 A A A A A A A B 20.11 B U B B B B B B 20.95 A A A A A A A A 20.84 A A A A A A A A 20.30 B B B U A A A A 21.29 A A A A A A A A 19.19 A A A A A A A A 22.43 B A A A A A A A 20.29 A A A A A A A A 19.31 B B B B B B B B 18.75 B B A A B B B B 19.75 A A B B B B B B 21.26 A B B B B B B B 20.35 A A A A A A A A 16.93 B B B B B B B B 20.21 U A A U U B B B 20.78 A B B B B A A B 20.72 B B B B B B A U 23.87 A A A A U A A A 18.26 B U U B B B B B

A A A B A A B B A U A B A B A A B B A A A B B B B A B B B B B B

A A A B A A B B A B A A A A A A B B A A A B B B B A B B B B B B

3
22.83 19.76 22.01 21.54 16.64 20.16 19.73 21.58 20.95 22.45 19.96 24.17 22.63 17.56 19.60 19.93 19.12 A A B B B B A B A A A A B B A A B A A B B B B B B A A A A B B A A B A A B U B B B B A A A U B B A A B A A B B B B B B A A A A B B A A B A B B A B A A B A A A A B B A A B A B B A B A U B A A A A B B A A B A B B A B B B B A A A A B B A A B A B B B B B B B A A A A B B A A B A B B B B B B B A A B A B B A U B U B B B B B B A A A B B B B A A A 20.96 21.89 19.77 17.51 20.84 19.62 19.57 21.29 18.73 22.34 19.23 23.35 20.06 17.84 ; run; B B B A U B A B U A A A B B B B A A U A A A B A A A B A B B U A A A A A B A A A B A B B A A A A A A B A A A B A B B A A A A B U B A A A B A B B A A A A B A B A A A U A B B A B A A B A B A A A B A B B A A B A B A B A A A B A B B A A U A B A B A A A B A B U B A U A B A B A A A B A

The following dataset is an example of the map dataset. The three variables must be entered in this order, marker, position and chromosome. The marker variable stores the name of markers. This variable must be in character type. The marker names must match the marker variables defined in the primary dataset.
/* Program 2-2 */

data two; input marker $ position chromoso; cards; M1 0 1 M2 10.2 1 M3 18.6 1 M4 25.8 1 M5 33.9 1 M6 42.1 1 M7 51.8 1 M8 62.0 1 M9 71.4 1 M10 80.3 1 ; run;

The following code shows how to invoke PROC QTL.


/* Program 2-3 */

proc qtl data=one map=two out=result method="ml" step=5/fixed; model y =; matingtype "BC"; genotype A1A1='A' A1A2='B'; estimate "A"= .5 -.5; run;

The following figure shows the display of the output generated by PROC QTL.
The QTL Procedure Mapping information: Mapping method: Maximum likelihood method Step: 5.00 centiMorgan(cM) / Fixed 100

Maximum number of iterations: Convergence error: 1.000E-08

Population Information: Sample size: 100 0

Number of non-QTL effects: Number of markers: Number of traits: Mating type: 10 1

Backcross

Marker Information: AA: 486 AB: 460 54 1000

Number of missing marker genotypes: Total number of marker genotypes: Missing marker proportion: QTL Effect(s) defined: A : 0.50 AA 0.50 AB 5.40%

The main result of PROC QTL is a stored SAS dataset named RESULT. This output dataset has 21 observations and 12 variables as shown below.
Table 2.1 output of Program 2-3 trait chr marker position n_Iter conv_err LRT Wald ve intercpt A var_1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

y y y y y y y y y y y y y y y y y y y y y

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

M1

M2 M3 M4 M5 M6 M7

M8 M9 M10

0 5 10 10.2 15.2 18.6 23.6 25.8 30.8 33.9 38.9 42.1 47.1 51.8 56.8 61.8 62 67 71.4 76.4 80.3

2 2 2 2 3 2 3 3 3 2 3 2 3 3 4 2 2 3 3 3 2

4.66E-11 7.30E-09 8.19E-10 1.97E-10 1.12E-10 3.80E-09 1.82E-10 5.86E-12 6.65E-10 1.60E-09 1.04E-10 3.81E-13 1.75E-09 1.92E-11 6.46E-10 8.57E-09 2.68E-10 3.55E-09 1.26E-12 2.08E-09 1.97E-09

3.598 4.492 4.638 4.627 4.295 3.800 6.311 7.223 10.051 11.112 14.966 16.588 20.415 22.463 16.905 9.509 9.211 9.316 7.802 7.990 6.909

3.695 5.139 4.799 4.750 4.618 3.884 6.792 7.495 11.210 11.764 17.012 18.065 24.227 25.667 21.091 10.184 9.685 10.813 8.147 9.248 7.273

2.604 2.569 2.577 2.578 2.581 2.600 2.529 2.512 2.428 2.416 2.307 2.287 2.174 2.149 2.230 2.451 2.462 2.436 2.497 2.471 2.517

20.114 20.108 20.104 20.103 20.096 20.092 20.079 20.073 20.076 20.081 20.103 20.108 20.117 20.119 20.144 20.160 20.160 20.169 20.183 20.166 20.156

0.621 0.728 0.705 0.702 0.694 0.640 0.835 0.875 1.049 1.070 1.255 1.286 1.452 1.485 1.372 1.001 0.979 1.031 0.909 0.960 0.858

0.104 0.103 0.104 0.104 0.104 0.105 0.103 0.102 0.098 0.097 0.093 0.092 0.087 0.086 0.089 0.099 0.099 0.098 0.101 0.100 0.101

The 12 variables in the output dataset are: Trait (name of the dependent variable specified in the model statement, Chr (the chromosome identification), Marker (marker name), Position (the location of the chromosome that is scanned by PRCO QTL), n_Iter (the number of iterations required for the ML method to converge), Conv_Err (the convergence error); LRT (the likelihood ratio test statistic), WALD (the WALD test statistics), Ve (the residual error variance), Intercpt (the intercept or mean), Additive (The additive QTL effect) and var_1 (the variance of the estimated QTL effect, the square root of it is the standard error).

Syntax: QTL procedure


The following statements are available in PROC QTL. Items within the < > are optional.
PROC QTL < options >; CLASS variable list; MODEL trait list = non-QTL-effects; MARKER variable list; MATINGTYPE 'label'; GENOTYPE genotype = 'label1' genotype = 'label2' <genotype = 'label3' genotype = 'label4'>; ESTIMATE 'label1' = effect-contrast < 'label2' = effect-contrast 'label2' = effect-contrast >; RANGE number list; WEIGHT variable; BY variable list;

The PROC QTL statement invokes the procedure. The MODEL statement, the GENOTYPE statement and the ESTMATE statement are required along with the PROC QTL statement. All other statements are optional. The following table gives a brief description for each statement of PROC QTL.
Table 3.1 PROC QTL Statement Options Statement PROC QTL CLASS MODEL MARKER MATINGTYPE GENOTYPE ESTIMATE RANGE WEIGHT BY Description invokes the procedure declares classification (discrete) variables defines the linear model to be fit for non-QTL effects, e.g., location and gender provides names of markers to be included in the analysis define the type of line cross defines marker genotypes defines QTL effects (linear contrasts of genotypic values) specifies a region of the genome for analysis declares a weight variable declare variables as subgroups for separate analysis (data must be sorted prior to the procedure is called)

PROC QTL Statement


PROC QTL < options >;

The QTL procedure starts with the PROC QTL statement. Table 3.2 summarizes some important options in the PROC QTL statement by function. These and other options in the PROC QTL statement are then

7
Table 3.2 PROC QTL Statement Options Option BURNIN = COVERAGE = DATA= DISTRIBUTION EBAYESPARM GENOTYPE = INTERACTION MAP= MAXERR = MAXITER= METHOD= OUT= OUTPOST = PERMUTATION POSTERIORSAMPLE = POSITION = SEED = STEP = TRIM = Description sets the number of iterations that will be discarded before we collect the posterior samples set the average genome coverage (in cM) of each QTL specifies the input data set selects an appropriate distribution to describe a discrete variable. provide the values of hyperparameters for the empirical Bayes method defines the genotype updating algorithm to be used when the Bayesian method Include epistatic effects in EBAYES method specifies the map data set sets the convergence criterion specifies the maximum number of iterations determines estimation method specifies the output data set provide result of the post MCMC analysis allows users to perform the permutation analysis for the Bayesian method sets the posterior sample size allows users to update the QTL position during the MCMC sampling process sets a seed to initialize the pseudorandom generator gives the increment (cM) for genome scanning specifies the number of iterations that will be skipped for collection of the posterior sample

Table 3.3 Options that are compatible with the methods in the PROC QTL statement. Option BURNIN COVERAGE DATA DISTORTION DISTRIBUTION EBAYESPARM GENOTYPE INTERACTION MAP MAXERR MAXITER OUT OUTPOST PERMUTAION POSITION POSTERIORSAMPLE SEED STEP TRIM Default Setting 2000 20 Yes OFF Yes 2, 0 IMPUTE OFF 1.00E-08 100 OFF OFF DYNAMIC 500 0 1 20 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes LS IRLS ML FISHER BAYES Yes Yes Yes EBAYES

Yes

Yes Yes Yes Yes Yes Yes Yes

Yes

Yes

Yes

Yes

Yes

described fully in alphabetical order. Table 3.3 summarizes the options that are compatible with the methods in the PROC QTL statement. You can specify the following options in the PROC QTL statement. BURNIN = number This option sets the number of iterations that will be discarded before we collect the posterior samples. The default value is 2000, but a larger number may be specified to avoid collection of samples before MCMC converges to the stationary distribution. This option is only valid for the Bayes method. COVERAGE = number This option is used to set the average genome coverage (in cM) of each QTL. The default value for the COVERAGE option is 20, i.e., one QTL is placed in every 20 cM of the genome. Choice of the proper value for this option depends on the value of the POSITION option. If POSTION = "DYNAMIC" is specified, the QTL positions will move along the genome, and thus a large value of the genome coverage per QTL number can be set. The genome coverage per QTL determines the number of QTL included in the model. The more the coverage per QTL, the less the number of QTL placed on the genome. If POSTION="STATIC" is specified, however, a small value of the genome coverage per QTL should be specified, i.e., more QTL should be placed on the genome. This will make sure that every region of the genome has an equal chance of being evaluated. The program will place at least one QTL per chromosome. If one sets COVERAGE= 1000, i.e., a QTL covers 1000 cM of the genome. This is equivalent to putting no QTL on the genome. In this case, the Bayesian method will only evaluate the markers. This is a Bayesian version of the all marker analysis. The option is valid for the BAYES method only. DATA = SAS-dataset This option names an input SAS data set used by the QTL procedure. This dataset is called the primary SAS dataset throughout the manual. If no dataset is named, by default, PROC QTL takes the current SAS data set as the primary input dataset for QTL mapping. The primary dataset should contain the phenotypic values, marker genotypes and any other variables relevant to the QTL mapping, e.g., sex, location, year and so on. Depending on the type of mapping populations, there are two formats of the primary datasets. One format is for BC, F2, RIL and DH populations. The other format is for FW cross (four-way cross). The first format is commonly seen in QTL mapping in line crosses. The second format (required by the FW cross) is a typical format required for pedigree data analysis. The following example shows the primary dataset of the first format, which

has 29 variables and 110 observations. Marker genotype variables can be in either character type or numeric type. The 26 variables, m1-m26, represent 26 markers. The values of the markers are A (homozygote), B (homozygote), H (heterozygote) and U (missing value).
data format1; input id sex $ wt10 datalines; 1 M 55.0 A A A A 2 M 54.2 B B B B 3 F 61.6 H H H H 4 M 66.6 H H H B 5 M 67.4 B B B B 110 M 53.2 U U A A ; run; (m1-m26)($); A H H H B H U H B B H U H B B H H H H B A A H B B U A H B B H A H B B H H H B B U H B B B B H H B B B H H B B B H H B B B U H B B B H H H B H U H H H B B H H H B B B B H B B B B H B H H B H B H H H H B A H H H B A A H H

A A A H H U H H U A A U A A A U A U U U U U

The following example shows the primary dataset of the second format, which has 16 variables and 112 observations. The first column is the id (identification) of an individual, the second and third column are the id's of the sire and the dam, respectively. An id with value -1 (any negative number) indicates that this individual is a founder. The genotype of each marker is represented by two characters (or a two-digit number), the first character (or digit) for the paternal allele and the second character for the maternal allele. For the founders, the paternal and maternal alleles must be entered in the correct order. For non-founder individuals, we may not have the information about the paternal and maternal allelic origins, and thus the two alleles can be entered arbitrarily. Once you input the marker genotypes in the second format, you do not need to use the GENOTYPE statement to convert the genotypes. The second format requires parental data to be entered into the pedigree prior to their children. The FW design only involves two parents, and thus the first two rows are reserved for the two parents and data of the children occupy the remaining rows of the dataset. Each marker variable contains a value in two letters or two digits. For the founders (parents), the first letter (or digit) represents the paternal allele and the second letter (or digit) represents the maternal allele. The phase in the children is irrelevant.
data format2; input id sir dam sex $ wt10 (m1-m10)($); datalines; 1 -1 -1 F -999 AB AB AB AB AB AB AB AB AB AB 2 -1 -1 M -999 AB AB AB AB AB AB AB AB AB AB 3 1 2 M 55.0 AA AA AA AA AA AB AB AB AA UU 4 1 2 M 54.2 BB BB BB BB AB UU UU AB AA AA 5 1 2 F 61.6 AB AB AB AB AB AB AB AB AB AB 6 1 2 M 66.6 AB AB AB BB AB BB BB AB BB BB 7 1 2 M 67.4 BB BB BB BB BB BB BB BB BB BB 112 1 2 M 53.2 UU UU AA AA AA AA AA AB AB UU ; run

10

User may use either "." or "-999.999" to indicate missing phenotypic values in the primary dataset. PROC QTL handles missing (unobserved) phenotypic variables using one of two ways. For the Bayesian method (only Bayesian method), the missing phenotypic values are sampled from their conditional posterior distribution within each iteration. For all other methods (non-Bayesian methods), missing phenotypic values are replaced by the observed population means of the traits. Users may also delete observations with missing phenotypic values in the data step before calling the QTL procedure. The current version of PROC QTL (except the Bayesian method) cannot handle missing phenotypic values measured as discrete or count data. Users are strongly encouraged to delete observations with missing phenotypic values in the data step. Optimal strategies for handling missing values are under development and will be available soon. DISTRIBUTION = 'distribution-type' This option allows users to select an appropriate distribution to describe a discrete variable. The current version of PROC QTL supports two distributions: the 'POISSON' distribution and the 'BINOMIAL' distribution. More distribution will be added later. Although the Poisson variable and Binomial variable are discrete, users should NOT declare them in the CLASS statement once this option is specified. This option is valid for the LS, FISHER and ML method. EBAYESPARM = {number, number} Users can use this option to provide the values of hyperparameters { ,} for the empirical Bayes method (EBAYES). By default, { ,}= {2,0} , which is equivalent to an unbounded uniform prior for the variance component for each QTL effect (regression coefficient). Under this setting, an explicit solution exists for each variance component conditional on other variance components in each iteration. Otherwise, the SIMPLEX algorithm will be used to find the numerical values. In most cases, PROC QTL generates reasonably good results with less computing time by using the default hyperparameter setting. Therefore, it is strongly recommended that users ignore this option. This option is valid for the EBAYES method only. In the Bayesian shrinkage analysis (W ANG et al. 2005; XU 2003), the hyper parameter values are { ,}={0,0} , which is also called the Jeffreys prior or vague prior, i.e., p ( k2 ) = 1 / k2 . GENOTYPE = 'genotype-updating-approach' This option defines the genotype updating algorithm to be used when the

11

Bayesian method of QTL mapping is selected. It takes no effect if any other method is turned on. The current version provides two alternatives: "IMPUTE" and "EXPECT". When "IMPUTE" is specified, the QTL genotypes for each individuals will be sampled from its conditional posterior probability distribution. If "EXPECT" is specified, the QTL genotypes will be substituted by the conditional expectations. "IMPUTE" is the default value for the GENOTYPE option. This option is valid only for the BAYES method. INTERACTION This option is a switch to indicate whether or not epistatic effects are included in the analysis. This option is only valid for the EBAYES method in the current version of the program. MAP = SAS-dataset This option names the SAS data set for the linkage map of marker loci. A valid map dataset must always contain the following three variables in the correct order: marker-name, marker-location and chromosome. Markername must be a character variable. The location of a marker must be numerical and measured in cM. When the MAP=SAS-dataset option is used, PROC QTL will perform interval mapping based on this map. If this option is not chosen, you need to provide names of markers you want to analyze using the MARKER statement. PROC QTL will then conduct marker analysis without map (see the MARKER statement described later). The following example shows a map dataset with three variables and 26 observations. The marker names in the MAP dataset must match the marker variables defined in the primary SAS dataset.
data map; input marker $ position chromosome; datalines; M1 0.0 1 M2 19.6 1 M3 25.5 1 M4 26.4 1 M26 87.9 2 ; run;

MAXERR= maximum-convergence-error This option sets the convergence criterion for any methods that require iterations. The default value is MAXERR=1E-8. A smaller value may increase the number of iterations required to converge. MAXITER= maximum number of iterations This option defines the maximum number of iterations allowed for any

12

numerical methods specified in the METHOD option. If this option is absent, the default maximum number of iterations is 100. METHOD = 'method' < /DISTORTION > This option specifies a statistical method for QTL mapping. Five methods are available in the current version of the program. They are the least square method (LS) (HALEY and KNOTT 1992), the iteratively reweighted least square (IRLS)(XU 1998a, b), the maximum likelihood method (ML) (LANDER and BOTSTEIN 1989), the Fisher scoring method (FISHER) (HAN and XU 2008), the Bayesian method (BAYES) (W ANG et al. 2005) and the empirical Bayesian method (EBAYES) (XU 2007). If the method option is not specified, the default method is ML. With the ML method, users may further test segregation distortion and map QTL based on non-Mendelian segregation ratio by adding the DISTORTION option (XU 2008; XU and HU 2009). For example, users who analyze the mouse data may want to test segregation distortion using the following option,
METHOD='ML'/DISTORTION

If the DISTORTION option is omitted, Mendelian segregation is assumed for calculating the conditional probability of QTL genotype given marker information. The DISTORTION option only takes effect when the ML method is used. You may specify the DISTORTION option under other methods, but it will take no effect.
Table 3.4 Legal values of the METHOD option METHOD LS IRLS ML FISHER BAYES EBAYES single trait distortion continuous analysis Q Q Q Q Q Q M multiple traits discrete Q Q Q Q Q Poisson / Binomial Q Q Q continuous Q Q Q Q Q discrete continuous & discrete

Note: Q, the method can be used to perform QTL mapping; M, the method is only valid for marker analysis. OUT = SAS-dataset This option names an output data set that contains the results of QTL mapping. The number of variables and observations depends on other options or statements provided by the user. Basically, the dataset contains the chromosome position scanned (called virtual map), the number of iterations taken for convergence, the test statistics (LRT and Wald), the regression coefficients for non-QTL effects, the estimated residual error variance, the estimated QTL effects (linear contrasts of genotypic values)

13

and the variance-covariance matrix of the estimated QTL effects. Note that the output data set will also contain the calculated segregation proportions of genotypes for the scanned positions and the likelihood ratio test statistic for the deviation from Mendelian segregation ratio if the DISTORTION suboption is specified with the METHOD='ML' option (see the METHOD='label' option. If METHOD=BAYES, the output is entirely different from that of any other method. For the LS, IRLS, FISHER and ML methods, the following variables appear most likely in the output SAS dataset.
Table 3.5 Variables in output dataset of LS, IRLS, FISHER and ML methods Variables trait chr marker position n_iter conv_err LRT Wald ve intercpt var_i cov_i_j intcpt_1 intcpt_n LRT_dist freq_AA freq_AB freq_BB Description The name of dependents variable specified in the model statement Chromosome identification of the position scanned The name of molecular marker variable described in the primary dataset Location of each assumed locus in the linkage map Number of iterations required for convergence Convergence error Likelihood ratio test statistics Wald test statistics Residual error variance The intercept or the mean of the dependent variable (trait) The variance of the i-th user-defined QTL effect The covariance between the i-th and the j-th QTL effects The intercept for ordinal data analysis. The first and last values are -1E10 and 1E10, respectively. The test statistics for segregation distortion analysis The estimated frequencies of genotypes AA, AB and BB for the locus scanned

Table 3.6 Variables in output dataset of BAYES method Variables fixed_i ve intcpt_1, , intcpt_n chr_i p_i a_i b_i va_i vb_i Description The i-th fixed effect (non-QTL effect) The residual error variance The intercepts for ordinal data analysis. The first and last values are 1E10 and 1E10, respectively. Chromosome identification of the i-th QTL included in the model The location of the i-th QTL in the corresponding linkage group (chromosome) a_i: the first user-defined effect for the i-th QTL b_i: the second user-defined effect for the i-th QTL Va_i: The variance of the first user-defined effect for the i-th QTL Vb_i: The variance of the second user-defined effect for the i-th QTL

14

The BAYES method produces an output file containing the posterior sample for all variables generated in the MCMC sampling process. The variables in the posterior sample are listed in the table 3.6. OUTPOST = SAS-dataset </ { options }> This option allows users to perform post MCMC analysis for the Bayesian method. The default result of post MCMC analysis contains the posterior sample size, the posterior means and the posterior variance-covariance matrix of estimated QTL effects for each putative position of the genome. For example, if you use the following option,
OUTPOST = RESULT / {STEP = 5.0}

the result dataset will contain the posterior sample size (count), the posterior means and the posterior variance-covariance matrix of all estimated QTL effects for every segment of the genome that covers 5 cM. The content of the OUTPOST dataset for the Bayesian analysis is similar to that of the OUT dataset of the none-Bayesian methods (see Table 3.5) except that a new variable named 'count' is added. The count represents the number of hits by a QTL for each defined segment of the genome. The OUTPOST = option is valid for the BAYES method only. The sub-options available in the OUTPOST option are described as follows.
STEP = number

This sub-option specifies the bin size of the genome for the post MCMC analysis. The valid value for this option should be a number larger than 0.05 and smaller than 10; otherwise, this option will be ignored. By default, STEP = 1.0 is assigned to this option. This option is only valid when POSITION=DYNAMIC is turned on; otherwise, it will be ignored.
MCMCINPUT=SAS-dataset

This sub-option assigns a SAS dataset that contains a pre-prepared MCMC sample. Once a dataset is provided here, PROC QTL will skip the sampling process and directly execute the post MCMC analysis for the provided posterior sample loaded here.
QUANTILE ={ number list }

This sub-option specifies the quantiles (percentiles) of the posterior sample for each estimated QTL effect. The valid values for this option should be numbers between 0 and 1. The number outside the scope will be replaced with its nearest boundary value, i.e., 0 for a negative number and 1 for any values larger than one. This option is only valid when the POSITION=STATIC is turned on; otherwise, it will be ignored.

15

PERMUTATION This option allows users to perform permutation analysis for the Bayesian method. Once the PERMUTATION option is turned on in the PROC QTL statement, the phenotypic values will be reshuffled before parameters are sampled in every circle of the MCMC sampling process. Since the QTL effects in the posterior samples are drawn from the null distributions, users can infer the 95% and 99% confidence intervals of the posterior samples (null distributions). QTL effects fall outside the 95% confidence intervals of the null distributions are considered as significant. The permutation test is better performed in marker analysis or fixed QTL positions, i.e., POSITION = static. A demonstration of the permutation analysis can be found in Example 9. Note that the PERMUTATION option is valid for the Bayesian method only. POSITION = 'position-updating-approach' </RANDOM> This option allows users to update the QTL position during the MCMC sampling process. There are two approaches for QTL position updating. One is DYNAMIC and the other is STATIC. When the DYNAMIC approach is specified, the QTL position is updated using the MetropolisHastings algorithm. If the /RANDOM suboption is turned on, the QTL position is updated randomly, i.e., a new position is randomly selected in the neighborhood of the old position and it is always accepted without using the Metropolis-Hastings criterion to decide whether the new position should be accepted or not. When "STATIC" is turned on, the QTL positions will stay where they are throughout the entire MCMC sampling process. This option is valid for the BAYES method only. POSTERIORSAMPLE = number This option sets the posterior sample size, i.e., the number of observations saved in the MCMC sampling process. This option is valid for the BAYES method only. SEED = number This option sets a seed to initialize the pseudorandom generator used by the BAYES method. By default, SEED = 0, PROC QTL gets the current time from the system clock as the random seed and provides different results each time the program is executed. When the SEED=x for x>0, the results are repeatable. In other words, a separate run of the program with the same non-zero seed will generate exactly the same result. STEP=number </FIXED> This option gives the increment (step size in cM) for genome scanning. The default number is 1 cM. Without specifying the /FIXED suboption, the step

16

size may vary from one interval to another interval to make sure that marker positions are included in the virtual map. The number assigned to the step size is the maximum increment allowed in the scanning. For example, if STEP=2 is chosen, then PROC QTL will scan the genome in every d cM, where d 2. The value of d will be equal to 2 if and only if the interval size divided by 2 is a whole number (integer). With the /FIXED suboption, the number assigned is exactly the step size except that marker positions are forced to be included in the virtual map. For example, if STEP=2/FIXED, the genome will be scanned in every 2 cM except that the step prior to a marker may be less than 2 cM if the interval divided by 2 is not a whole number. Since RPOC QTL generates a virtual map that always includes marker positions, users may use STEP=1E8 option to perform marker analysis only. TRIM = number This option specifies the number of iterations that will be skipped for collection of the posterior sample after the burnin period. For example, if TRIMMING=20 is specified, we then collect one observation in every 20 iterations after the burnin period. A larger number of trimming will decrease the serial correlation between consecutive observations of the posterior sample. Again, TRIMMING is only valid when the Bayesian method is used for QTL mapping.

BY Statement
BY variable;

Users may use a BY statement to obtain separate analyses on observations in groups defined by the BY variable. The BY variable must be sorted in the primary dataset before PROC QTL is executed. Users may declare more BY variables, just like the BY variable statement used with any other built-in SAS procedures.

CLASS Statement
CLASS variable list;

The CLASS statement declares one or more variables (variable list) as discrete variables. Typical class variables are TREATMENT, SEX, RACE, GROUP and REPLICATION. Users may also declare a TRAIT (DEPENDENT variable) in the CLASS statement. If a trait is declared as a CLASS variable, it will be treated as a binary or ordered categorical trait and the generalized linear model (GLM) under the PROBIT link function will be used to perform the QTL mapping. Variables declared in the CLASS statement will be recoded and decomposed into one or more single

17

contrasts by using the full rank design function (see the DESIGNF function in the PROC IML environment). Therefore, user may expect to see more non-QTL effects in the output dataset than the number of variables included in the MODEL statement. Variables with non-integer values are valid variables for inclusion in the CLASS statement, but each different value will be treated as a separate category during the recoding process. An excessive number of categories may cause the program to halt because PROC QTL can only handle a maximum number of 10 categories.

ESTIMATE Statement
ESTIMATE 'QTL-effect-name' = contrast < 'QTL-effect-name' = contrast >

With the ESTIMATE statement, a user can define up to three (one, two or three) different QTL effects expressed as linear functions (or contrasts) of the genotypic values. For the BC, RIL1, RIL2 and DH mating designs, there are only two genotypes. Therefore, only one QTL effect can be defined. For the BC design, the two genotypes follow this order: A1A1 and A1A2. Let G11 and G12 be the genotypic values. The QTL effect is defined as A=G11G12. Therefore, the estimate statement appears as
ESTIMATE 'A' =1 -1;

For the remaining two mating designs (RIL1, RIL2 and DH), the order of the two genotypes is: A1A1 and A2A2. The QTL effect is defined as A=G11G22. Therefore, the estimate statement is
ESTIMATE 'A'=1 -1;

User can use any other linear combinations as the QTL effect. For example, a user may prefer using A=0.5G11-0.5G22 as the QTL effect, in which case the estimate statement should be
ESTIMATE 'A'=0.5 -0.5;

Users may also want to express the QTL effect as A=G11 assuming that G22=0. The estimate statement should be written as
ESTIMATE 'A'=1 0;

For an F2 mating design, the order of the three genotypes is A1A1, A1A2 and A2A2. Users can define up to two QTL effects, additive and dominance effects. Let A=G11-G22 be the additive effect and D=G12-0.5(G11+G22) be the dominance effect. The estimate statement is
ESTIMATE 'A'=1 0 -1 'D'=-0.5 1 -0.5;

18

Users have the flexibility to choose any other different scales to define the QTL effects with the estimate statement. If a user only wants to fit an additive model, the corresponding estimate statement should appear like ESTIMATE 'A'=1 0 -1, simply ignoring the contrast for the dominance effect. For a FW cross design, the order of the four possible genotypes is as follows: A1A3, A1A4, A2A3 and A2A4, assuming that the parental mating type is A1A2A3A4. Users can define up to three QTL effects (i.e., you can estimate one, two or three QTL effects). Users can define the three different effects in an arbitrary fashion, but we recommend the following estimate statement,
ESTIMATE 'A_m'=1 1 -1 -1 'A_f'=1 -1 1 -1 'D'=1 -1 -1 1;

where A_m is the difference between the two alleles of the male parent, A_f is the difference between the two alleles of the female parent and D is the interaction between A_m and A_f (the so called dominance effect).

GENOTYPE Statement
GENOTYPE genotype = 'label1' genotype = 'label2' <genotype = 'label3'>;

This statement is required for all matingtypes except the FW design. Depending on the type of line crosses (specified by the MATINGTYPE statement), the number of possible marker genotypes varies from two (e.g., BC population) to four (e.g., four way cross). Investigators may code the genotypes arbitrarily in the primary dataset. However, PROC QTL requires a conversion from the genotype labels in the primary dataset to the labels that are recognizable by the procedure. This process is accomplished by using the genotype statement. Since the number of possible genotypes per locus depends on the type of line crosses (matingtype), the genotype labels need to be described under each matingtype. BC mating design The genotype statement for the BC design is either
GENOTYPE A='label1' H='label2';

or
GENOTYPE A1A1='label1' A1A2='label2';

where label1 and label2 are the homozygote and heterozygote, respectively, given in the primary dataset. Note that a BC population contains only two genotypes.

19

DH mating design The genotype statement for the double haploid design is either
GENOTYPE A='label1' H='label2';

or
GENOTYPE A1A1='label1' A1A2='label2';

where label1 and label2 are the two homozygotes. Note that a DH population contains only two genotypes. F2 mating design The genotype statement is
GENOTYPE A1A1='label1' A1A2='label2' A2A2='label3';

The three labels (label1, label2 and label3) are simply character values used by the users in the primary dataset to indicate the three genotypes (first homozygote, heterozygote and second homozygote). Any other character values (not label1, label2 and label3) appeared in the primary dataset will be treated as missing values. An alternative genotype conversion system is
GENOTYPE A='label1' H='label2' B='label3';

Again, label1, label2 and label3 are character values used by the users in the primary dataset to indicate the first homozygote, the heterozygote and the second homozygote, respectively. FW mating design The four way mating design requires an entirely different system for genotype data input and the GENOTYPE statement is not needed (see the primary SAS dataset of format2). RIL mating design The genotype statement for the recombinant inbred lines (RIL1 and RIL2) is either
GENOTYPE A='label1' B='label2';

or
GENOTYPE A1A1='label1' A2A2='label2';

where label1 and label2 are the two homozygotes. Note that an RIL population contains only two genotypes.

20

MARKER Statement
MARKER variable list

This statement defines marker variables for inclusion in the marker analysis if a map dataset is not provided. Sometimes, investigators may be interested in marker analysis before a map has been generated (map not available). In this case, users must provide the names of markers to be included in the analysis. The markers are declared using the marker statement. If users provide both the map dataset and the marker statement, the marker statement will take no effect.

MATINGTYPE Statement
MATINGTYPE 'matingtype';

The matingtype statement is used to define the type of population. The current version of the program can handle four different populations (mating types): BC (backcross), F2, RIL (recombinant inbred lines), DH (double haploid) and FW (four-way cross). There are two different types of RIL. RIL1 is created by sefling the F2 for many generations. RIL2 is generated by brother-sister mating for many generations. If MATINGTYPE 'RIL' is used without specifying which type of the two RILs, it means MATINGTYPE 'RIL1'. We recommend using MATINGTYEP 'RIL1' or MATINGTYPE 'RIL2' explicitly to eliminate any confusion. Note that the definition of MATINIGTYPE statement will affect declaration of the GENOTYPE statement and the ESTIMATE statement.

MODEL Statement
MODEL trait = < non-QTL-effects >;

The MODEL statement names the traits (also called the dependent variables) and non-QTL effects (independent variables). The dependent variables occur in the left hand side of the equation and non-QTL effects occur in the right hand side of the equation. Users may specify multiple continuous variables (traits) in the left hand side of the MODEL equation, but category traits must be analyzed one at a time, i.e., only one discrete variable is allowed to appear in the left hand side of the "=" sign. All variables that appear as traits in the MODEL statement must be numerical variables defined in the primary data set. If a categorical variable has been coded as a character variable, this variable must be recoded as numerical variable in the primary dataset and then discretized in the CLASS statement before PROC QTL can analyze it as an ordinal trait. A valid variable for

21

category trait may contain 2 to 10 categories and observations in each category must not be less than 5% of the sample size. If the variable has more than 10 categories or one or more categories have less than 5% of the sample size, the program will stop execution and the users are asked to recombine some of the categories before the program is re-executed. Both numerical and character variables are acceptable as non-QTL effects that occur in the MODEL statement. Discrete variables, however, need to be declared in the CLASS statement before they are entered into the model statement. If the DISTRIBUTION option in the QTL statement is defined as "BINORMIAL", users can specify the trait in the form of a single variable (binary trait only) or in the form of a ratio of two variables denoted by events/trials.
MODEL events/trials = < non-QTL-effects >;

This form is applicable only to summarized Bernoulli response data. When each observation in the input data set contains the number of events (for example, successes) and the number of trials from a set of Bernoulli trials, use the events/trials syntax. In the events/trials model syntax, users need to specify two variables that contain the event and trial counts. These two variables are separated by a slash (/). The values of both events and trials must be nonnegative, and the value of the trials variable must be greater than 0 for an observation to be valid. When each observation in the input data set contains a single trial from a Bernoulli experiment, use the first form of MODEL specification. If no non-QTL effects occur in the data, simply use MODEL trait = ; or "MODEL events/trials = ;" without specifying any variables in the right hand side of the "=" sign because by default, intercept (or mean) is always included in the analysis unless you specify an option in the MODEL statement with /NOINTERCEPT. The markers are independent variables and, normally, all independent variables should appear in the right hand side of the model equation. However, the MODEL statement designed here already assumes that all markers in the map dataset have been included, and thus marker variables should not appear again in the right hand side of the equation. If a map dataset is not provided, RPOC QTL will conduct marker analysis (not interval mapping) and the markers to be analyzed should be declared in the MARKER statement (see next paragraph).

RANGE Statement
RANGE number-list;

This statement is one of the optional statements and it allows the users to analyze a subset of the genome. The valid number range is from 1 to m,

22

where m is the total number of points to be scanned in the virtual map. This statement is useful once a user completes the interval mapping and wants to manipulate the program to do some further analysis, such as composite interval mapping. Users can hand-pick markers as co-factors and put these co-factors into the MODEL statement as non-QTL effects, and then scan a region of interest using the RANGE statement. The result of the scan for the specified region will be equivalent to that of the composite interval mapping because co-factors (markers) have been taken into account in the non-QTL effects listed in the MODEL statement.

WEIGHT statement
WEIGHT variable;

This statement declares a variable in the primary dataset as the weight to the observation. The WEIGHT statement is very useful if the data points for the phenotypic values are averages of several individual plants of the same genotypes (replicated experiment). In this case, the WEIGHT variable is the number of plants used to calculate the data point (average phenotype). One can only define one weight variable. If more than one variable are defined, only the first one is considered as the weight variable. Missing values and negative numbers of the weight variable are treated as 0 (eliminated from the data analysis if an observation has a weight with a value 0).

Details: QTL procedure


Details of the methods and algorithms implemented by PROC QTL can be found from a book entitled "Principles and Procedures of QTL Mapping". Users may access to the contents by the shortcut form "start menu -> Programs -> PROC QTL" after PROC QTL is installed. Users may also download the PDF version of the book from our website: http://www.statgen.ucr.edu

23

Examples: QTL procedure


Example 1: QTL mapping for continuous trait
This example shows the application of PROC QTL to a real life data from an F2 mouse population (LAN et al. 2006). The number of mice is 145 and the number of markers is 196. However, the example only shows the data of the first two chromosomes with 26 markers only. The trait is the ten week's body weight (wt10). The primary dataset and the MAP dataset in this example are named E1DATA and E1MAP, respectively. All datasets used in examples 1-6 have been copied to the SASUSER library when the PROC QTL is installed. The variable SEX can be used as a none-QTL-effect, wt10 is the body weight at week 10. The following code will scan the two chromosomes for QTL for the trait wt10.
/* Program 4-1-1 */

proc qtl data=sasuser.E1data map=sasuser.E1map out=result method='ml'/distortion step=1.0; class sex; model wt10= sex; matingtype 'F2'; genotype A1A1='A' A1A2='H' A2A2='B'; estimate 'additive'=1 0 -1; run;

The output is a SAS dataset named RESULT, which has 222 observations and 17 variables as shown below in Table 4.1.1. PROC QTL 1.0 also provides a BY statement. By using this statement, users may perform an analysis for different genders separately. Note that, similar to that of the other SAS procedures, the primary dataset must be sorted before the BY statement can be used in the analysis. We may use the following statements to prepare the primary dataset and conduct QTL mapping separately for different genders.
/* Program 4-1-2 */

proc sort data=sasuser.E1data out=mouse2; by SEX; run; proc qtl data=mouse2 map=E1map out=result method='ml'/distortion step=1.0; model wt10=;

24

matingtype 'F2'; genotype A1A1='A' A1A2='H' A2A2='B'; estimate 'additive'=1 0 -1; by sex; run;

The output generated from the above code is a SAS dataset named RESULT, which has 444 observations and 17 variables as shown in

25

Table 4.1.1 Output dataset of the mouse wt10 QTL mapping generated by Program 4-1-1. trait chr marker position n_Iter conv_err LRT Wald ve intercpt fix_A_1 additive var_1 LRT_dist freq_AA freq_AB freq_BB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 222

wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

M1

0 0.98 1.96 2.94 3.92 4.9 5.88 6.86 7.84 8.82 9.8 10.78 11.76 12.74 13.72 14.7 15.68 16.66 17.64 18.62 87.9

4 5 6 6 7 7 8 8 8 8 8 8 8 8 7 7 6 6 5 5 2

4.84E-09 2.12E-09 1.04E-09 8.07E-09 2.89E-09 9.91E-09 2.50E-09 4.59E-09 6.30E-09 6.73E-09 5.76E-09 4.01E-09 2.29E-09 1.06E-09 4.87E-09 1.64E-09 7.65E-09 1.65E-09 7.67E-09 6.39E-10

2.9103 3.2571 3.6267 4.0136 4.4106 4.8105 5.2051 5.5868 5.9487 6.2858 6.5942 6.8717 7.1169 7.3313 7.5161 7.6738 7.8078 7.9225 8.0231 8.1156

3.2649 3.8162 4.4236 5.0764 5.7590 6.4496 7.1251 7.7613 8.3371 8.8356 9.2451 9.5591 9.7757 9.8968 9.9273 9.8748 9.7474 9.5551 9.3071 9.0135

31.9097 31.7571 31.5911 31.4151 31.2335 31.0523 30.8773 30.7144 30.5684 30.4429 30.3402 30.2614 30.2066 30.1749 30.1651 30.1753 30.2036 30.2477 30.3055 30.3748

60.0733 60.0532 60.0315 60.0087 59.9853 59.9621 59.9395 59.9183 59.8989 59.8817 59.8669 59.8545 59.8446 59.8372 59.8322 59.8295 59.8288 59.8301 59.8332 59.8380

-1.6046 -1.6173 -1.6301 -1.6428 -1.6551 -1.6666 -1.6770 -1.6860 -1.6936 -1.6997 -1.7043 -1.7076 -1.7097 -1.7108 -1.7110 -1.7104 -1.7092 -1.7076 -1.7057 -1.7035

-1.4203 -1.5333 -1.6479 -1.7617 -1.8720 -1.9762 -2.0719 -2.1570 -2.2303 -2.2911 -2.3394 -2.3753 -2.3996 -2.4130 -2.4164 -2.4109 -2.3973 -2.3767 -2.3498 -2.3176

0.6178 0.6160 0.6139 0.6113 0.6085 0.6055 0.6025 0.5995 0.5966 0.5941 0.5919 0.5902 0.5890 0.5883 0.5882 0.5886 0.5896 0.5912 0.5933 0.5959

1.7158 1.9153 2.1249 2.3414 2.5610 2.7796 2.9935 3.1996 3.3959 3.5813 3.7556 3.9191 4.0727 4.2172 4.3535 4.4823 4.6042 4.7195 4.8284 4.9308

0.2000 0.1967 0.1935 0.1904 0.1875 0.1848 0.1824 0.1803 0.1785 0.1770 0.1757 0.1747 0.1739 0.1732 0.1727 0.1724 0.1722 0.1720 0.1720 0.1720

0.5116 0.5120 0.5123 0.5125 0.5125 0.5123 0.5119 0.5114 0.5107 0.5099 0.5090 0.5081 0.5071 0.5061 0.5052 0.5044 0.5037 0.5032 0.5028 0.5026

0.2884 0.2913 0.2942 0.2971 0.3001 0.3029 0.3057 0.3083 0.3108 0.3131 0.3153 0.3173 0.3191 0.3207 0.3221 0.3232 0.3241 0.3248 0.3252 0.3254

wt10 2

M26

1.07E-10 0.3420 0.4113 32.7317 60.1958 -1.4577 0.4695 0.5360 1.4148

0.2899 0.4438 0.2663

26

Table 4.1.2. Output dataset of the mouse wt10 QTL mapping for separate sex generated by Program 4-1-2. sex trait chr marker position n_Iter conv_err LRT Wald ve intercpt additive var_1 LRT_dist freq_AA freq_AB freq_BB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 444

F F F F F F F F F F F F F F F F F F F F F M

wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10 wt10

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

M1

M2 M26

0 0.98 1.96 2.94 3.92 4.9 5.88 6.86 7.84 8.82 9.8 10.78 11.76 12.74 13.72 14.7 15.68 16.66 17.64 18.62 19.6 87.9

2 4 5 6 7 7 8 8 8 8 8 8 7 7 7 6 6 5 4 4 3 3

9.06E-14 2.28E-10 1.02E-09 1.40E-09 1.08E-09 5.58E-09 1.51E-09 2.58E-09 3.00E-09 2.61E-09 1.80E-09 1.02E-09 5.17E-09 2.18E-09 7.26E-10 2.98E-09 5.56E-10 1.69E-09 4.77E-09 3.37E-11 2.11E-11

2.6893 2.9157 3.1583 3.4125 3.6713 3.9269 4.1715 4.3984 4.6017 4.7793 4.9288 5.0495 5.1416 5.2059 5.2439 5.2573 5.2485 5.2202 5.1758 5.1191 5.0548

2.7652 3.1669 3.6112 4.0871 4.5768 5.0582 5.5095 5.9123 6.2543 6.5289 6.7336 6.8684 6.9350 6.9363 6.8754 6.7560 6.5831 6.3619 6.0991 5.8027 5.4814

26.6041 26.4294 26.2388 26.0377 25.8339 25.6367 25.4545 25.2941 25.1594 25.0523 24.9731 24.9212 24.8956 24.8952 24.9186 24.9647 25.0317 25.1180 25.2212 25.3386 25.4672

61.6819 61.6791 61.6755 61.6711 61.6658 61.6598 61.6532 61.6459 61.6381 61.6299 61.6215 61.6128 61.6041 61.5953 61.5867 61.5782 61.5701 61.5623 61.5551 61.5485 61.5427

-1.6512 -1.7713 -1.8959 -2.0213 -2.1430 -2.2564 -2.3576 -2.4445 -2.5158 -2.5718 -2.6130 -2.6403 -2.6544 -2.6563 -2.6465 -2.6257 -2.5944 -2.5533 -2.5032 -2.4448 -2.3796

0.9860 0.9907 0.9954 0.9997 1.0034 1.0065 1.0089 1.0107 1.0120 1.0131 1.0140 1.0149 1.0160 1.0172 1.0187 1.0205 1.0225 1.0248 1.0273 1.0301 1.0330

0.3004 0.3599 0.4302 0.5091 0.5928 0.6773 0.7588 0.8344 0.9030 0.9645 1.0195 1.0695 1.1156 1.1592 1.2013 1.2429 1.2846 1.3270 1.3706 1.4160 1.4636

0.2414 0.2373 0.2331 0.2289 0.2249 0.2210 0.2174 0.2142 0.2113 0.2087 0.2063 0.2041 0.2021 0.2003 0.1986 0.1970 0.1955 0.1940 0.1927 0.1914 0.1902 0.3465

0.5345 0.5399 0.5454 0.5509 0.5561 0.5608 0.5650 0.5685 0.5713 0.5735 0.5752 0.5764 0.5771 0.5775 0.5776 0.5774 0.5769 0.5762 0.5753 0.5742 0.5730 0.3572

0.2241 0.2228 0.2215 0.2202 0.2191 0.2182 0.2176 0.2173 0.2174 0.2178 0.2185 0.2195 0.2207 0.2222 0.2238 0.2256 0.2276 0.2297 0.2320 0.2343 0.2368 0.2963

wt10 2

6.14E-10 0.0187 0.1188 38.3229 58.7801 -0.3698 1.1507 4.1384

27

Example 2: QTL mapping for discrete traits


The example is a real data of a F2 population for rice sheath blight disease (ZOU et al. 2000). There are 12 molecular markers, distributed along two chromosomes covering 268 cM in length. The sample size is 119. The disease resistance of each individual is measured in grade from grade 1 to grade 6. The primary dataset and MAP dataset are named E2data and E2map, respectively. All datasets in examples 1-6 will be copied to the SASUSER library when PROC QTL is installed. The following code invokes PROC QTL. Since RESISTANC is declared as a CLASS variable, RESISTANC will be treated as an ordered categorical variable in the analysis. Although you may define up to two QTL effects in the F2 population, this example shows that you can ignore the dominance effect.
/* Program 4-2 */

proc qtl data=sasuser.E2data map= sasuser.E2map out=result method="Fisher" step=1.0; class resistenc; model resistenc=; matingtype "F2"; genotype A="1" B="3" H="2"; estimate "a"=1 0 -1; run;

The output is a SAS dataset named result2, which has 270 observations and 18 variables as shown in Table 4.2.

28

Table 4.2. Output of the rice blight disease data generated by Program 4-2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 trait resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten resisten chr marker position 1 RM245 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 RM205 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 RM20B 101 n_Iter 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 conv_err 1.81E-11 7.56E-11 2.26E-10 4.64E-10 7.1E-10 9E-10 1.02E-09 1.1E-09 1.15E-09 1.2E-09 1.25E-09 1.28E-09 1.23E-09 2.32E-09 3.81E-09 5.48E-09 6.93E-09 7.73E-09 7.64E-09 6.84E-09 LRT 6.9127 7.4031 7.8886 8.3600 8.8072 9.2202 9.5894 9.9067 10.1657 10.3624 10.4953 10.5655 10.5762 10.9574 11.3470 11.7428 12.1426 12.5436 12.9426 13.3361 Wald 6.9278 7.4615 7.9745 8.4549 8.8945 9.2889 9.6372 9.9405 10.2009 10.4210 10.6027 10.7476 10.8567 11.1468 11.4234 11.6827 11.9211 12.1359 12.3253 12.4884 0.8856 ve 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 intcpt_0 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 intcpt_1 -1.1924 -1.1970 -1.2021 -1.2076 -1.2131 -1.2186 -1.2238 -1.2286 -1.2328 -1.2365 -1.2395 -1.2419 -1.2437 -1.2489 -1.2546 -1.2606 -1.2669 -1.2734 -1.2801 -1.2869 intcpt_2 -0.5065 -0.5085 -0.5111 -0.5141 -0.5174 -0.5208 -0.5244 -0.5279 -0.5313 -0.5346 -0.5378 -0.5408 -0.5436 -0.5465 -0.5497 -0.5531 -0.5568 -0.5607 -0.5647 -0.5689 intcpt_3 -0.0621 -0.0622 -0.0627 -0.0637 -0.0651 -0.0668 -0.0688 -0.0711 -0.0735 -0.0761 -0.0788 -0.0816 -0.0845 -0.0854 -0.0864 -0.0876 -0.0889 -0.0904 -0.0921 -0.0940 intcpt_4 0.3423 0.3441 0.3453 0.3460 0.3461 0.3457 0.3448 0.3434 0.3415 0.3392 0.3366 0.3335 0.3301 0.3314 0.3327 0.3339 0.3350 0.3361 0.3369 0.3376 intcpt_5 1.3260 1.3308 1.3350 1.3384 1.3408 1.3424 1.3429 1.3423 1.3408 1.3383 1.3347 1.3303 1.3249 1.3300 1.3352 1.3404 1.3458 1.3510 1.3562 1.3611 1.2422 intcpt_6 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 a 0.3546 0.3764 0.3966 0.4148 0.4305 0.4436 0.4538 0.4612 0.4657 0.4674 0.4661 0.4620 0.4549 0.4744 0.4942 0.5139 0.5334 0.5524 0.5708 0.5883 var_1 0.0181 0.0190 0.0197 0.0204 0.0208 0.0212 0.0214 0.0214 0.0213 0.0210 0.0205 0.0199 0.0191 0.0202 0.0214 0.0226 0.0239 0.0251 0.0264 0.0277

270 resisten 2

4.01E-09 0.8907

-1E+10 -1.2106 -0.5360 -0.1000 0.2893

-0.1244 0.0175

29

Example 3: QTL mapping in a four-way cross design


This example shows QTL mapping in a four-way cross design. There are 202 individuals in the primary dataset. The first two individuals are the parents and the remaining 200 individuals are progeny. The phenotypic values of the two parents will not be analyzed in QTL mapping. The parents only provide the marker genotypes. If there are no phenotypic values for the parents, simply place zeros because these two phenotypic values will not be used anyway. The primary dataset and the MAP dataset are named E3data and E3map, respectively. All datasets that are used in examples 1-6 will be copied to the SASUSER library when PROC QTL is installed. The following statements perform QTL mapping for the FW design. Note that users may analyze a discrete trait (y2) as if it were a continuous trait in the analysis if the trait is not declared in the CLASS statement.
/* Program 4-3-1 */

proc qtl data=sasuser.E3data map= sasuser.E3map out=result method="irls" step=1.0; model y2=; estimate "a_m"=1 1 -1 -1 "a_f"= 1 -1 1 -1; matingtype "fw"; run;

The output SAS dataset is named RESULT, which has 175 observations and 15 variables, as shown in Table 4.3.1. The second trait in the primary dataset (y2) is a discrete trait. We can analyze this trait separately as an ordinal trait using the following statements. The map dataset is not used, thus, you need the MARKER statement to list the markers for inclusion of the analysis.
/* Program 4-3-2 */

proc qtl data=sasuser.E3data out=result method="ml" step=1.0; class y2; model y2=; marker M1-M19; estimate "a_m"=1 1 -1 -1 "a_f"= 1 -1 1 -1; matingtype "fw"; run;

The output SAS dataset named RESULT has 19 observations and 17 variables as shown in Table 4.3.2.

30

Table 4.3.1. Output of Program 4-3-1. trait chr marker position n_Iter Conv_err LRT Wald ve intercpt a_m a_f var_1 cov_1_2 var_2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 175

y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2

1 M1 1 1 1 1 1 1 1 1 1 1 1 M2 1 1 1 1 1 1 1 1 2 M19

0 0.927 1.855 2.782 3.709 4.636 5.564 6.491 7.418 8.345 9.273 10.2 11.133 12.067 13 13.933 14.867 15.8 16.733 17.667 83.2

1 3 3 3 4 4 4 4 4 3 3 1 3 3 4 4 4 4 3 3 1

0 1.79E-10 2.27E-09 8.40E-09 1.03E-10 1.61E-10 1.72E-10 1.25E-10 5.77E-11 3.87E-09 3.97E-10 0 6.17E-10 5.70E-09 1.01E-10 1.50E-10 1.30E-10 6.59E-11 3.54E-09 3.29E-10 0

3.3164 3.5537 3.7919 4.0310 4.2705 4.5085 4.7423 4.9681 5.1805 5.3732 5.5376 5.6634 5.9631 6.2343 6.4749 6.6823 6.8531 6.9829 7.0665 7.0971 2.2702

3.3440 3.4652 3.6031 3.7586 3.9324 4.1255 4.3386 4.5726 4.8286 5.1080 5.4124 5.7443 5.8674 5.9987 6.1381 6.2862 6.4436 6.6113 6.7906 6.9835 2.2831

1.1604 1.1582 1.1562 1.1543 1.1525 1.1508 1.1493 1.1480 1.1471 1.1465 1.1464 1.1468 1.1440 1.1415 1.1395 1.1379 1.1368 1.1363 1.1364 1.1372 1.1665

2.4892 2.4903 2.4915 2.4927 2.4940 2.4954 2.4968 2.4983 2.4999 2.5016 2.5034 2.5053 2.5064 2.5075 2.5085 2.5095 2.5105 2.5114 2.5122 2.5130 2.5146

0.0513 0.0570 0.0629 0.0689 0.0751 0.0815 0.0880 0.0947 0.1015 0.1085 0.1157 0.1231 0.1277 0.1323 0.1369 0.1414 0.1460 0.1506 0.1552 0.1599 -0.0126

0.1305 0.1298 0.1289 0.1281 0.1273 0.1264 0.1256 0.1248 0.1240 0.1232 0.1226 0.1219 0.1201 0.1184 0.1167 0.1151 0.1135 0.1120 0.1106 0.1093 0.1158

0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058 0.0060

-0.0002 -0.0002 -0.0003 -0.0004 -0.0004 -0.0005 -0.0005 -0.0006 -0.0006 -0.0006 -0.0006 -0.0006 -0.0006 -0.0006 -0.0006 -0.0005 -0.0005 -0.0005 -0.0005 -0.0005 -0.0005

0.0060 0.0060 0.0060 0.0060 0.0060 0.0060 0.0060 0.0060 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059 0.0059

31

Table 4.3.2. Output of Program 4-3-2. trait marker n_Iter conv_err LRT Wald ve intcpt_0 intcpt_1 intcpt_2 intcpt_3 intcpt_4 a_m a_f var_1 cov_1_2 var_2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19

4 3 3 4 4 4 70 4 4 3 3 22 3 3 3 3 3 3 3

1.84E-11 6.72E-09 6.63E-09 9.59E-13 4.88E-12 4.27E-10 9.86E-09 1.68E-10 1.22E-10 6.7E-09 5.54E-10 6.95E-09 1.84E-09 3.82E-09 1.3E-09 8.5E-09 5.99E-10 9.85E-10 1.15E-09

3.6957 6.0040 6.9683 9.9955 17.4851 32.3860 27.3604 19.9627 12.6076 3.4708 0.8295 3.1222 3.8660 4.1467 2.2302 4.3384 1.5769 0.9288 2.4180

3.7435 6.0544 6.9752 10.0300 17.4554 31.6667 38.9494 19.5890 12.5270 3.4456 0.8311 3.1047 3.8527 4.0948 2.2114 4.2811 1.5711 0.9229 2.4229

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

-1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10 -1E+10

-0.7519 -0.7744 -0.7866 -0.7780 -0.7941 -0.8480 -1.8158 -0.8219 -0.7970 -0.7802 -0.7762 -0.7702 -0.7732 -0.7815 -0.7655 -0.7580 -0.7582 -0.7757 -0.7773

0.0561 0.0377 0.0259 0.0405 0.0435 0.0250 0.0928 0.0162 0.0228 0.0211 0.0242 0.0360 0.0346 0.0243 0.0360 0.0469 0.0418 0.0218 0.0218

0.7242 0.7116 0.7061 0.7257 0.7426 0.7588 1.6222 0.7222 0.7144 0.6937 0.6909 0.7082 0.7068 0.6993 0.7074 0.7239 0.7115 0.6904 0.6933

1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10 1E+10

-0.0621 -0.1292 -0.1637 -0.1874 -0.1977 -0.3181 -0.4179 -0.2248 -0.1708 -0.1125 -0.0229 -0.0431 -0.0425 -0.1318 -0.0983 -0.1382 -0.0954 -0.0212 0.0173

-0.1348 -0.1253 -0.1098 -0.1445 -0.2617 -0.3468 -1.6224 -0.2782 -0.2213 -0.0969 -0.0659 -0.1397 -0.1400 -0.0755 -0.0547 -0.0727 -0.0116 -0.0691 -0.1188

0.0058 0.0059 0.0059 0.0059 0.0059 0.0062 0.0145 0.0060 0.0060 0.0059 0.0058 0.0189 0.0058 0.0058 0.0059 0.0060 0.0060 0.0060 0.0060

-0.0001 -0.0006 -0.0004 -0.0004 0.0002 0.0008 -0.0060 0.0005 0.0003 0.0005 0.0001 -0.0135 -0.0005 -0.0002 -0.0002 -0.0003 -0.0005 -0.0003 -0.0005

0.0060 0.0060 0.0060 0.0061 0.0060 0.0062 0.1223 0.0060 0.0060 0.0059 0.0058 0.0193 0.0059 0.0058 0.0058 0.0058 0.0058 0.0058 0.0058

32

Example 4: QTL mapping via the Bayesian method


This example shows the analysis of a simulated dataset of a BC population for a continuous trait (W ANG et al. 2005). There are 121 markers evenly distributed along a chromosome covering 2400 cM in length. The number of the individuals is 498. The primary dataset and the MAP dataset are named E4data and E4map, respectively. All datasets used in examples 1-6 will be copied to the SASUSER library when PROC QTL is installed. The following code will invoke the Bayesian QTL mapping and the posterior sample will be analyzed with a 1 cM bin.
/* Program 4-4 */

proc qtl data=sasuser.E4data map= sasuser.E4map out=MCMC outpost=result/{step=1.0} method="bayes" genotype="expect" position="dynamic" coverage=25 burnin=2000 trimming=20 seed=0 posteriorsample=500; model trait=; genotype A="A" H="H"; estimate 'a'=-1 1; matingtype 'BC'; run;

The output SAS dataset named RESULT has 2401 observations and 6 variables as shown in Table 4.4.
Table 4.4 Output of Program 4-4. chr marker position count a var_1

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 1 1 1 1 1 1 1 1 1 1 1 1 1

COL1

0 1 2 3 4 5 6 7 8 9 10 11 12 13

9 10 13 15 19 13 20 21 17 16 14 15 21 16 0

0.1417 0.0787 0.1064 0.2197 0.2256 0.1863 0.2990 0.1705 0.1505 0.2484 0.3160 0.2122 0.2449 0.3824

0.0119 0.0134 0.0208 0.0801 0.0459 0.0705 0.0800 0.0475 0.1059 0.0818 0.0760 0.0557 0.0587 0.4287

2401 1

COL121 2400

0.0000 0.0000

33

Example 5: Estimating genomewide epistatic effects via the empirical Bayesian method
This example uses the mice data (ATCHLEY et al. 1997) of Example 1 to demonstrate marker analysis for continuous traits via the empirical Bayesian method (eBayes). Epistatic effects can be estimated by turning on the INTERACTION option in the PROC QTL statement. The following codes will invoke the empirical Bayesian method.
/* Program 4-5 */

proc qtl data=sasuser.E1data map=sasuser.E1map out=result method='ebayes' interaction ebayesparm={-2, 0}; model wt10= ; matingtype 'F2'; genotype A1A1='A' A1A2='H' A2A2='B'; estimate 'additive'=1 0 -1; run;

The output SAS dataset named RESULT has 351 observations and 14 variables as shown in Table 4.5.

34

Table 4.5. Output or Program 4-5 for the empirical Bayes method. chr1 marker1 pos1 chr2 marker2 pos2 u_additi s_additi v_additi f_additi n_Iter conv_err ve intercpt

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 351

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2

M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M1 M26

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87.9

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M26

0 19.6 25.5 26.4 42.9 48.7 50.1 62.9 69.6 71.7 73.9 77.9 85.4 98.7 101.7 104.2 110.5 121.6 0 13.2 87.9

4.33E-31 -1.62 -9.2E-32 -9.2E-32 5.45E-31 4.62E-31 4.96E-31 5.9E-31 2.63E-31 2.84E-31 1.18E-31 2.57E-31 4.39E-31 8.99E-31 8.3E-31 5.67E-31 2.86E-31 -1E-31 3.29E-32 -6.3E-31 -6.1E-31

1E-30 3.128471 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30 1E-30

1E-15 0.709898 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15 1E-15

1.87E-31 5.207614 8.44E-33 8.51E-33 2.97E-31 2.14E-31 2.46E-31 3.48E-31 6.91E-32 8.06E-32 1.4E-32 6.59E-32 1.93E-31 8.09E-31 6.89E-31 3.22E-31 8.18E-32 1.02E-32 1.08E-33 3.95E-31 3.77E-31

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09 6.57E-09

16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.8822 16.88219

63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644 63.1644

35

Example 6: Joint mapping of QTL for multiple traits


This example demonstrates how to perform joint mapping for multiple continuous traits using an interval mapping method and the Bayesian method. This example uses a simulated data of F2 population. There are 241 markers dispersed evenly along a chromosome covering 2400 cM in length. Two continuous traits, y1 and y2, and all the 241 markers were generated for a total of 500 individuals. The two traits were converted subsequently to ordinal traits, named yy1 and yy2, which have three categories and four categories, respectively. The primary dataset and the MAP dataset are named E6data and E6map, respectively. There is no significant difference between QTL mapping of multiple traits and that of single trait in terms of the program syntax. Users only need to specify multiple dependent variables in the MODEL statement. However, not all methods in the PROC QTL statement are valid for multiple trait analysis. Please refer to the summary of METHOD option in the PROC QTL statement for more details about the availability of methods for joint mapping of multiple traits. In the current version of PROC QTL, there are only two interval mapping methods, LS and ML, are valid for joint mapping of multiple continuous traits. The following code will invoke ML method to joint mapping for two continuous traits, y1 and y2.
/* Program 4-6-1 */

proc qtl data=sasuser.E6data map=sasuser.E6map out=RESULT method= "ML" step=1.0; model y1 y2=; matingtype 'F2'; genotype A1A1='1' A1A2='2' A2A2='3'; estimate 'additive'=1 0 -1; run;

The output SAS dataset named RESULT has 2401 observations and 16 variables as shown in Table 4.6.1. The following will show how to map multiple traits jointly using the Bayesian method. The following code will perform joint mapping for continues trait y1 and ordinal trait yy2 and directly provide the post analysis result of the Bayesian analysis.

36

/*

Program 4-6-2

*/

proc qtl data= sasuser.E6data map= sasuser.E6map out=MCMC outpost= RESULT /{step=1.0} method= "bayes" genotype="impute" position="dynamic" coverage=20 burnin=2000 trim=20 posteriorsample=300; class yy2; model y1 yy2=; matingtype 'F2'; genotype A1A1='1' A1A2='2' A2A2='3'; estimate 'additive'=1 0 -1; run;

The output SAS dataset named RESULT has 2401 observations and 6 variables as shown in Table 4.6.2.

37

Table 4.6.1. Output of Program 4-6-1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2401 chr1 marker1 position n_Iter conv_err 1 M1 0 1 2.19E-09 1 1 1 4.63E-09 1 2 1 7.95E-09 1 3 2 7.01E-11 1 4 2 1.16E-10 1 5 2 1.68E-10 1 6 2 2.13E-10 1 7 2 2.15E-10 1 8 2 1.48E-10 1 9 2 5.08E-11 1 M2 10 2 3.15E-14 1 11 2 6.33E-10 1 12 2 7.63E-09 1 13 3 3.92E-10 1 14 3 9.16E-10 1 15 3 1.13E-09 1 16 3 8.29E-10 1 17 3 3.65E-10 1 18 3 8.68E-11 1 19 2 3E-09 1 M241 2400 LRT 7.6747 7.6940 7.6587 7.5653 7.4123 7.2006 6.9335 6.6171 6.2593 5.8703 5.4613 6.0297 6.6088 7.1809 7.7275 8.2309 8.6764 9.0531 9.3543 9.5779 ve1 33.1105 33.1109 33.1146 33.1220 33.1331 33.1478 33.1659 33.1869 33.2102 33.2352 33.2611 33.2218 33.1819 33.1423 33.1043 33.0689 33.0372 33.0100 32.9879 32.9712 cov1_2 -4.1286 -4.1263 -4.1250 -4.1248 -4.1259 -4.1282 -4.1317 -4.1364 -4.1421 -4.1485 -4.1556 -4.1497 -4.1448 -4.1410 -4.1387 -4.1380 -4.1388 -4.1412 -4.1451 -4.1501 ve2 30.8966 30.8951 30.8939 30.8930 30.8925 30.8924 30.8927 30.8934 30.8945 30.8959 30.8976 30.8985 30.8998 30.9015 30.9036 30.9060 30.9086 30.9113 30.9142 30.9171 trait1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 y1 LRT1 7.6747 7.6940 7.6587 7.5653 7.4123 7.2006 6.9335 6.6171 6.2593 5.8703 5.4613 6.0297 6.6088 7.1809 7.7275 8.2309 8.6764 9.0531 9.3543 9.5779 20.9651 intecpt1 additiv1 trait2 10.0396 0.9834 y2 10.0382 1.0031 y2 10.0369 1.0158 y2 10.0356 1.0209 y2 10.0344 1.0178 y2 10.0334 1.0062 y2 10.0325 0.9862 y2 10.0319 0.9581 y2 10.0314 0.9225 y2 10.0311 0.8804 y2 10.0311 0.8330 y2 10.0280 0.9020 y2 10.0248 0.9672 y2 10.0214 1.0260 y2 10.0182 1.0761 y2 10.0150 1.1159 y2 10.0121 1.1446 y2 10.0094 1.1619 y2 10.0070 1.1681 y2 10.0050 1.1640 y2 9.9730 1.4156 y2 LRT2 intecpt2 additiv2 0.7937 5.0295 -0.3077 0.8144 5.0300 -0.3188 0.8313 5.0305 -0.3282 0.8439 5.0309 -0.3354 0.8516 5.0314 -0.3402 0.8541 5.0319 -0.3423 0.8509 5.0323 -0.3415 0.8416 5.0326 -0.3377 0.8261 5.0329 -0.3309 0.8046 5.0331 -0.3214 0.7777 5.0332 -0.3095 0.7274 5.0337 -0.3051 0.6759 5.0340 -0.2986 0.6258 5.0343 -0.2905 0.5795 5.0346 -0.2815 0.5384 5.0348 -0.2720 0.5029 5.0350 -0.2625 0.4729 5.0351 -0.2531 0.4479 5.0353 -0.2440 0.4272 5.0354 -0.2351 2.9831 4.9946 0.5889

2 4.85E-10 20.9651 32.5633 -4.7156 30.7649 y1

38
Table 4.6.2. Output of Program 4-6-2. chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 33 34 35 36 37 38 39 2401 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 marker M1 position 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 2400 count additiv1 additiv2 13 -0.0449 -0.3178 50 -0.0426 -0.3195 50 0.0149 -0.3640 35 0.0346 -0.2984 51 0.0683 -0.2998 38 -0.0547 -0.2630 20 -0.0762 -0.2712 18 -0.0375 -0.1729 19 -0.1490 -0.0951 15 0.0032 -0.1559 10 -0.0730 -0.0483 8 -0.0166 -0.0294 1 -0.0863 -0.0431 9 -0.1531 -0.2603 13 -0.2016 -0.2588 7 -0.0920 -0.0806 7 -0.1385 -0.2393 3 -0.0925 -0.0966 1 -0.3508 -0.6257 4 -0.1891 -0.1460 3 -0.1143 -0.1384 2 -0.4757 -0.3172 2 -0.2336 -0.2880 2 -0.3554 -0.0198 3 -0.5092 0.0164 5 -0.1678 -0.1048 2 -0.3457 -0.2301 8 -0.3280 0.0124 6 -0.4372 0.0680 6 -0.0316 0.0254 10 -0.3139 0.0296 6 -0.2115 0.0282 7 -0.4435 0.0230 5 -0.2716 0.0951 13 -0.3034 0.1016 5 -0.3328 0.1946 6 -0.1580 0.1198 10 -0.1873 0.2163 0 0.0000 0.0000

M2

M3

M4

1 M241

39

Example 7: Estimating genomewide QTL effects for discrete traits that follow some special distributions
The first part of this example uses a simulated data to demonstrate how to perform QTL mapping for discrete trait that follows Poisson distribution. There are 500 individuals sampled from a F2 population. The genotypes of 481 markers and the phenotype of a Poissondistributed trait have been simulated for each individual. The 481 markers are evenly dispersed along a 2400 cM chromosome. The primary dataset and the MAP dataset are named E7data1 and E7map1, respectively. The following code will invoke the ML method.
/* Program 4-7-1 */

proc qtl data=SASUser.E7data1 map=SASUser.E7map1 out=result method="ML" distribution="poisson"; model y=; matingtype "F2"; genotype A="1" B="-1" H="0"; Estimate "Add"=1 0 -1; run;

The output SAS dataset named RESULT has 2401 observations and 12 variables as shown in Table 4.7.1. Please note that although the Poisson variable is discrete, user should NOT declare it in the CLASS statement. The second part uses a real data of F2 population for the trait of wheat female sterility (DOU et al. 2009) to demonstrate QTL mapping for binomial traits. The seed setting ratio on fully pollinated spikes of 243 plants were evaluated to measure the female fertility, which is the ratio of the number of seed setting spikelets (seed_setting) to the total number of spikelets (spikelets). There are 28 molecular markers distributed along 5 chromosomes measured for each individual. The primary dataset and the MAP dataset are named E7data2 and E2map2 , respectively. The following code invokes the FISHER method.
/* Program 4-7-2 */

proc qtl data=SASUser.E7data2 map=SASUser.E7map2 out=result method="FISHER" distribution="binomial"; model seed_setting/spikelets =; matingtype "F2";

40

genotype A="A" B="B" H="H"; Estimate "Add"=1 0 -1; run;

The output SAS dataset named RESULT has 376 observations and 12 variables as shown in Table 4.7.2. A pseudo variable named 'binary' was generated using the following rules to indicate whether the plant is sterile completely or not. 1 forseed_setting > 0 binary = 0 forseed_setting = 0 The variable 'binary' can be treated as a special binomial trait in which the number of trials is one for all individuals. In this situation, users can ignore the trails in the model statement as shown below,
/* Program 4-7-3 */

proc qtl data=SASUser.E7data2 map=SASUser.E7map2 out=result method="FISHER" distribution="binomial"; model binary =; matingtype "F2"; genotype A="A" B="B" H="H"; Estimate "Add"=1 0 -1; run;

The output dataset named 'result' includes 376 observations and 12 variables as shown in Table 4.7.3.

41
Table 4.7.1. Output of Program 4-7-1. trait chr marker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 y y y y y y y y y y y y y y y y 1 M1 1 1 1 1 1 M2 1 1 1 1 1 M3 1 1 1 1 1 M4

position
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

n_Iter
3 3 4 4 4 2 3 3 3 3 3 4 4 4 4 3

conv_err

LRT

Wald

ve intercpt
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Add

var_1

5.64E-16 38.0860 37.6590 2.35E-09 37.3180 37.1450 1.98E-10 35.5820 35.6630 8.45E-10 32.7950 32.8800 6.97E-10 28.8170 28.4980 4.34E-09 23.4580 23.2960 1.14E-10 26.0970 25.8090 8.83E-10 28.4490 28.1070 8.49E-10 30.4250 30.0560 1.17E-10 31.9600 31.5830 1.43E-16 33.0010 32.7310 4.06E-09 42.2450 41.1980 4.34E-09 49.3450 48.8000 1.01E-09 54.5250 53.8520 3.42E-11 58.0440 57.0110 2.14E-14 60.0470 58.9690

0.5246 0.3044 0.0025 0.5249 0.3080 0.0026 0.5259 0.3049 0.0026 0.5280 0.2941 0.0026 0.5314 0.2726 0.0026 0.5374 0.2311 0.0023 0.5368 0.2477 0.0024 0.5366 0.2600 0.0024 0.5365 0.2681 0.0024 0.5367 0.2719 0.0023 0.5373 0.2710 0.0022 0.5281 0.3275 0.0026 0.5230 0.3551 0.0026 0.5201 0.3679 0.0025 0.5188 0.3706 0.0024 0.5190 0.3646 0.0023

2401

1 M481

2400

7.55E-16

0.4810

0.4810

0.5493 0.0339 0.0024

Table 4.7.2. Output of Program 4-7-2. trait chr marker position n_Iter conv_err
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 0.99 1.98 2.97 3.96 4.95 5.94 6.93 7.92 8.91 9.90 10.89 11.88 12.87 13.86 14.85 cft21 155.66

LRT

Wald ve intercpt
0.7834 0.7846 0.7856 0.7867 0.7877 0.7886 0.7895 0.7902 0.7908 0.7913 0.7916 0.7917 0.7916 0.7912 0.7907 0.7898 0.7022

Add
0.3403 0.3513 0.3624 0.3735 0.3847 0.3958 0.4066 0.4173 0.4275 0.4373 0.4465 0.4550 0.4628 0.4696 0.4753 0.4800 -0.0792

var_1
0.0006 0.0006 0.0007 0.0007 0.0007 0.0008 0.0008 0.0008 0.0009 0.0009 0.0009 0.0010 0.0010 0.0010 0.0010 0.0010 0.0007

1 seed_set 1 2 seed_set 1 3 seed_set 1 4 seed_set 1 5 seed_set 1 6 seed_set 1 7 seed_set 1 8 seed_set 1 9 seed_set 1 10 seed_set 1 11 seed_set 1 12 seed_set 1 13 seed_set 1 14 seed_set 1 15 seed_set 1 16 seed_set 1 376 seed_set 5

Xwmc667 0.00

1.77E-10 194.99 187.47 1 2.18E-10 197.86 190.73 1 3.15E-10 200.68 193.95 1 4.81E-10 203.41 197.13 1 7.25E-10 206.05 200.22 1 1.06E-09 208.57 203.22 1 1.49E-09 210.95 206.08 1 2.01E-09 213.17 208.79 1 2.62E-09 215.20 211.32 1 3.31E-09 217.02 213.64 1 4.05E-09 218.62 215.72 1 4.81E-09 219.98 217.56 1 5.55E-09 221.08 219.11 1 6.25E-09 221.91 220.37 1 6.86E-09 222.45 221.31 1 7.36E-09 222.70 221.93 1 6.60E-09 9.40 9.36 1

42
Table 4.7.3. Output of Program 4-7-3. trait
1 Binary 2 Binary 3 Binary 4 Binary 5 Binary 6 Binary 7 Binary 8 Binary 9 Binary 10 Binary 11 Binary 12 Binary 13 Binary 14 Binary 15 Binary 16 Binary 17 Binary 18 Binary 19 Binary 20 Binary 21 Binary 22 Binary 23 Binary 24 Binary 25 Binary 26 Binary 27 Binary 28 Binary 29 Binary 30 Binary 376 Binary

chr
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

marker

position n_Iter conv_err LRT Wald ve intercpt


0.00 0.99 1.98 2.97 3.96 4.95 5.94 6.93 7.92 8.91 9.90 10.89 11.88 12.87 13.86 14.85 15.84 16.83 17.82 18.81 19.80 20.79 21.79 22.78 23.77 24.76 25.75 26.74 27.73 28.72 155.66 4 7.77E-11 8.06 4 1.00E-10 8.10 4 1.32E-10 8.14 4 1.76E-10 8.17 4 2.40E-10 8.20 4 3.28E-10 8.21 4 4.48E-10 8.23 4 6.03E-10 8.23 4 7.96E-10 8.23 4 1.02E-09 8.22 4 1.28E-09 8.20 4 1.54E-09 8.18 4 1.80E-09 8.15 4 2.01E-09 8.11 4 2.17E-09 8.06 4 2.25E-09 8.00 4 2.24E-09 7.94 4 2.13E-09 7.87 4 1.93E-09 7.79 4 1.68E-09 7.70 4 1.38E-09 7.61 4 1.08E-09 7.51 4 8.05E-10 7.40 4 5.65E-10 7.28 4 3.74E-10 7.15 4 2.31E-10 7.02 4 1.34E-10 6.88 4 7.15E-11 6.73 4 3.50E-11 6.57 4 1.56E-11 6.41 3 6.35E-11 0.37 7.43 7.50 7.56 7.62 7.68 7.73 7.78 7.82 7.85 7.88 7.90 7.91 7.91 7.91 7.89 7.86 7.82 7.78 7.72 7.64 7.56 7.47 7.36 7.24 7.11 6.97 6.82 6.66 6.50 6.32 0.36

Add

var_1

1 Xwmc667

1 1.0918 0.3827 0.0197 1 1.0931 0.3935 0.0207 1 1.0944 0.4043 0.0216 1 1.0957 0.4150 0.0226 1 1.0968 0.4256 0.0236 1 1.0979 0.4360 0.0246 1 1.0988 0.4461 0.0256 1 1.0996 0.4557 0.0266 1 1.1001 0.4649 0.0275 1 1.1005 0.4734 0.0284 1 1.1006 0.4813 0.0293 1 1.1005 0.4883 0.0301 1 1.1001 0.4943 0.0309 1 1.0993 0.4994 0.0315 1 1.0983 0.5033 0.0321 1 1.0969 0.5061 0.0326 1 1.0951 0.5076 0.0329 1 1.0931 0.5078 0.0332 1 1.0906 0.5066 0.0333 1 1.0879 0.5041 0.0332 1 1.0848 0.5002 0.0331 1 1.0814 0.4950 0.0328 1 1.0778 0.4885 0.0324 1 1.0740 0.4808 0.0319 1 1.0699 0.4719 0.0313 1 1.0658 0.4620 0.0306 1 1.0616 0.4512 0.0298 1 1.0573 0.4396 0.0290 1 1.0531 0.4273 0.0281 1 1.0489 0.4145 0.0272 1 0.9867 -0.0853 0.0200

5 cft21

43

Example 8: Composite interval mapping using PROC QTL


PROC QTL does not directly support composite interval mapping (CIM, ZENG 1994), but with some extra coding, users can perform CIM. Users can create a SAS macro to perform CIM analysis using PROC QTL. In this example, we provide a SAS macro to allow users to perform CIM. The following code will invoke CIM analysis in two steps: 1) select marker cofactors according to the result of marker analysis; 2) perform CIM analysis using cofactors selected in the first step.
/* Program 4-8 */

data map; set SASUser.E7map1; run; data data; set SASUser.E7data1; run; /*estimate effects of markers by marker analysis*/; proc qtl data=out.data map=out.map out=lrtmarker method="fisher" step=100; model y1= ; matingtype "F2"; genotype A="1" B="-1" H="0"; Estimate "Add"=1 0 -1; run; proc iml; use lrtmarker; read all var {marker LRT Position Chr}; free index; do i=1 to nROW(LRT); /*the threshold to select cofactors*/; if (LRT[i]>80) then index=index//i; end; create cofactor var {marker position chr}; marker=marker[index]; position=position[index]; chr=chr[index]; append; quit;run; %macro CIM; %do M=1 %to 480; proc iml; Call SYMPUTX ("M1", %str(compress("M"+char(&M)))); Call SYMPUTX

44

("M2", %str(compress("M"+char(&M+1)))); use cofactor; read all var {marker chr}; %put M1 &M1 M2 &M2; CoMark=""; do i=1 to nrow(marker); if ("&M1"^=marker[i])&("&M2"^=marker[i]) then CoMark=CoMark+(marker[i]); end; Call SYMPUTX ("CoFact", %str(coMark)); quit;run; data map; set out.map; if (mark="&M1") | (mark="&M2"); run; proc qtl data=data map=map out=tmp method="fisher" step=1; class &CoFact; model y1= &CoFact; matingtype "F2"; genotype A="1" B="-1" H="0"; Estimate "Add"=1 0 -1; run; proc append base=RESULT data=tmp FORCE; run; %end; %mend; %CIM;

The output dataset contains 2880 observations and 42 variables, which includes 30 cofactors from 15 markers. For convenience of coding, all markers except for the first and the last are calculated twice in this code. However, users may avoid this duplication by adding the RANGE statement in PROC QTL.

45

Example 9: Permutation for the Bayesian shrinkage analysis


There are two permutation strategies in the Bayesian shrinkage analysis, permutation outside the Markov chain and permutation inside the Markov chain (CHE and XU 2010). PROC QTL does not support the first strategy of permutation because users simply permute the data and call PROC QTL to analyze the permuted data. The second strategy (permutation inside the Markov chain) is supported by PROC QTL. Users can turn on the PERMUTATION option in the PROC QTL statement to generate the posterior sample from the permuted data. The posterior distribution for each parameter mimics the null distribution. The following code will analyze the permuted data and generate the null distribution for each parameter.
/* Program 4-9 */

proc qtl data=sasuser.E4data map= sasuser.E4map out=mcmcsample outpost=post/{quantile={0.005 0.025 0.975 0.995}} method="bayes" genotype="expect" position="static" coverage=25 burnin=2000 trimming=10 seed=0 posteriorsample=1000 permutation; model trait=; genotype A="A" H="H"; estimate 'a'=-1 1; matingtype 'BC'; run;

The output dataset named post includes 96 observations and 10 variables as shown in Table 4.9 given below. The last four variables (Q1_1 Q1_4) represent the 0.5, 2.5, 97.5 and 99.5 percentiles of the posterior samples.

46
Table 4.9 Output of Program 4-9. chr marker position 12.5 37.5 62.5 87.5 112.5 137.5 162.5 187.5 212.5 237.5 262.5 287.5 312.5 337.5 362.5 387.5 412.5 437.5 462.5 487.5 512.5 537.5 562.5 587.5 612.5 637.5 662.5 687.5 712.5 737.5 762.5 787.5 812.5 837.5 862.5 887.5 912.5 937.5 2387.5 count 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 a -0.0177 -0.0031 0.0012 -0.0017 0.0021 0.0049 0.0004 -0.0012 0.0124 -0.0033 0.0098 0.0022 -0.0032 0.0065 -0.0057 0.0098 0.0035 -0.0007 -0.0067 -0.0047 -0.0011 -0.0016 0.0045 -0.0006 0.0044 0.0066 -0.0073 -0.0007 -0.0023 -0.0041 -0.0018 0.0083 -0.0156 0.0053 0.0004 0.0078 -0.0015 -0.0108 0.0100 var_1 0.0601 0.0298 0.0455 0.0365 0.0451 0.0199 0.0351 0.0273 0.0381 0.0324 0.0406 0.0290 0.0344 0.0397 0.0367 0.0542 0.0395 0.0317 0.0489 0.0410 0.0650 0.0317 0.0126 0.0325 0.0340 0.0417 0.0356 0.0495 0.0425 0.0392 0.0450 0.0481 0.0493 0.0337 0.0295 0.0290 0.0484 0.0364 0.0409 Q1_1 -1.4888 -0.9477 -1.1648 -1.1433 -1.1313 -0.6139 -0.9348 -0.9603 -0.8040 -0.8677 -0.8971 -0.7955 -1.0110 -0.8985 -1.1300 -1.0089 -0.9550 -0.9669 -1.2084 -1.1607 -1.3829 -0.9768 -0.5987 -0.9279 -0.9805 -1.0039 -1.0884 -1.1536 -1.0611 -1.0285 -1.0455 -1.1236 -1.2771 -0.8350 -0.8416 -0.6803 -1.3270 -1.1106 -1.0118 Q1_2 -0.5774 -0.2730 -0.3751 -0.3094 -0.4514 -0.0923 -0.2300 -0.2001 -0.2210 -0.3894 -0.2022 -0.2117 -0.3552 -0.2967 -0.4153 -0.3598 -0.3672 -0.3305 -0.5667 -0.3466 -0.5293 -0.3057 -0.0645 -0.3793 -0.3345 -0.3614 -0.3679 -0.4222 -0.3268 -0.3175 -0.3505 -0.2784 -0.5003 -0.3215 -0.2996 -0.2570 -0.3704 -0.3686 -0.2834 Q1_3 0.3256 0.1864 0.4248 0.3434 0.4194 0.1664 0.2177 0.1184 0.3988 0.2549 0.4294 0.2380 0.2905 0.3468 0.2819 0.4822 0.3750 0.2667 0.4427 0.2027 0.5804 0.2744 0.1752 0.3702 0.3615 0.4175 0.2829 0.4656 0.2902 0.2239 0.2830 0.4541 0.3509 0.3717 0.2918 0.3356 0.4736 0.1809 0.4688 Q1_4 1.0308 0.9452 0.9222 0.9824 1.1619 0.9805 1.0350 1.2596 1.3504 1.0250 1.2351 0.8539 0.9551 1.3176 1.0034 1.4772 1.1798 0.9534 1.0023 0.9376 1.1739 1.0987 0.6525 0.8921 1.0276 1.0229 0.9232 0.8986 1.0233 1.2047 1.2809 1.2732 0.8893 1.1037 0.9336 1.1436 1.0440 0.8488 1.1288

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 96

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

47

References:
Atchley, W. R., S. Xu and D. E. Cowley, 1997 Altering developmental trajectories in mice by restricted index selection. Genetics 146: 629-640. Che, X., and S. Xu, 2010 Significance test and genome selection in Bayesian shrinkage analysis. International Journal of Plant Genomics 2010: 11. doi: 10.1155/2010/893206 doi: 10.1155/2010/893206. Dou, B., B. Hou, H. Xu, X. Lou, X. Chi et al., 2009 Efficient mapping of a female sterile gene in wheat (Triticum aestivum L.). Genetical Research 91: 337-343 doi: 10.1017/S0016672309990218. Haley, C. S., and S. A. Knott, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324. Han, L., and S. Xu, 2008 A Fisher scoring algorithm for the weighted regression method of QTL mapping. Heredity 101: 453-464 doi: 10.1038/hdy.2008.78. Lan, H., M. Chen, J. Flowers, B. Yandell, D. Stapleton et al., 2006 Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2: e6. Lander, E. S., and D. Botstein, 1989 Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185-199. Sas Institute Inc, 1991 SAS/TOOLKIT Software: Usage and Refernce, Version 6, First Edition. Cary, NC: SAS Institute Inc. Wang, H., Y. Zhang, X. Li, G. L. Masinde, S. Mohan et al., 2005 Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465-480 doi: 10.1534/genetics.104.039354. Wang, S., C. J. Basten and Z. B. Zeng2007 Windows QTL Cartographer 2.5 Department of Statistics, North Carolina State University, Raleigh, NC (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm). Xu, S., 1998a Further investigation on the regression method of mapping quantitative trait loci. Heredity 80: 364-373. Xu, S., 1998b Iteratively reweighted least squares mapping of quantitative trait loci. Behavior Genetics 28: 341-355. Xu, S., 2003 Estimating Polygenic Effects Using Markers of the Entire Genome. Genetics 163: 789-801. Xu, S., 2007 An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513-521 doi: 10.1111/j.1541-0420.2006.00711.x. Xu, S., 2008 Quantitative trait locus mapping can benefit from segregation distortion. Genetics 180: 2201-2208 doi: 10.1534/genetics.108.090688. Xu, S., and Z. Hu, 2009 Mapping quantitative trait loci using distorted markers. International Journal of Plant Genomics 2009: Article ID 410825, 11 doi: 10.1155/2009/410825. Zeng, Z. B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457-1468. Zou, J. H., X. B. Pan, Z. X. Chen, J. Y. Xu, J. F. Lu et al., 2000 Mapping quantitative trait loci controlling sheath blight resistance in two rice cultivars ( Oryza sativa L.). Theoretical and Applied Genetics 101: 569-573 doi: 10.1007/s001220051517.

You might also like