Performs a QTL backward selection for loci in multi-environment trials or multiple populations (M.P. Boer, M. Malosetti, S.J. Welham & J.T.N.M. Thissen).

### Options

`PRINT` = string tokens |
What to print (`summary` , `model` , `components` , `effects` , `means` , `stratumvariances` , `monitoring` , `vcovariance` , `deviance` , `Waldtests` , `missingvalues` , `covariancemodels` ); default `summ` |
---|---|

`POPULATIONTYPE` = string token |
Type of population (`BC1` , `DH1` , `F2` , `RIL` , `BCxSy` , `CP` ); must be set |

`ALPHALEVEL` = scalar |
Defines a significance level; default 0.05 |

`VCMODEL` = string token |
Defines the variance-covariance model for the set of environments (`identity` , `diagonal` , `cs` , `hcs` , `outside` , `fa` , `fa2` , `unstructured` ); default `cs` for multi-environment trials, and `diagonal` for multiple populations |

`VCPARAMETERS` = string token |
Whether to re-estimate the variance-covariance model parameters (`estimate` , `fix` ); default `esti` |

`VCSELECT` = string token |
Whether to re-select the variance-covariance model (`no` , `yes` ); default `no` |

`CRITERION` = string token |
Criterion to use for model selection (`aic` , `sic` ); default `sic` |

`FIXED` = formula |
Defines extra fixed effects |

`UNITFACTOR` = factor |
Saves the units factor required to define the random model when `UNITERROR` is to be used |

`MVINCLUDE` = string tokens |
Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (`explanatory` , `yvariate` ); default `expl` , `yvar` |

`MAXCYCLE` = scalar |
Limit on the number of iterations; default 100 |

`WORKSPACE` = scalar |
Number of blocks of internal memory to be set up for use by the `REML` algorithm; default 100 |

### Parameters

`TRAIT` = variates |
Quantitative trait to be analysed; must be set |
---|---|

`GENOTYPES` = factors |
Genotype factor; must be set |

`ENVIRONMENTS` = factors |
Environment factor; must be set for a multi-environment trial |

`POPULATIONS` = factors |
Population factor; must be set for a multiple-population analysis |

`UNITERROR` = variates |
Uncertainty on trait means (derived from individual unit or plot error) to be included in QTL analysis; default `*` i.e. omitted |

`VCINITIAL` = pointers |
Initial values for the parameters of the variance-covariance model |

`SELECTEDMODEL` = texts |
`VCMODEL` setting for the selected covariance structure |

`ADDITIVEPREDICTORS` = pointers |
Additive genetic predictors; must be set |

`ADD2PREDICTORS` = pointers |
Second (paternal) set of additive genetic predictors |

`DOMINANCEPREDICTORS` = pointers |
Dominance genetic predictors |

`CHROMOSOMES` = factors |
Chromosomes corresponding to the genetic predictors; must be set |

`POSITIONS` = variates |
Positions on the chromosomes corresponding to the genetic predictors; must be set |

`IDLOCI` = texts |
Labels for the loci |

`IDMGENOTYPES` = texts |
Labels for the genotypes corresponding to the genetic predictors |

`QTLCANDIDATES` = variates |
Specifies the locus index numbers from which to start the selection; must be set |

`QTLSELECTED` = variates |
Saves the index numbers of the selected QTLs |

`INTERACTIONS` = variates |
Saves a logical variate indicating whether each selected QTL showed a significant (1) or non-significant (0) QTL-by-environment or QTL-by-population interaction |

`DOMSELECTED` = variates |
Saves a logical variate indicating whether each selected QTL showed a significant (1) or non-significant (0) effect of the `DOMINANCEPREDICTORS` |

`DOMINTERACTIONS` = variates |
Saves a logical variate indicating whether each selected QTL showed a significant (1) or non-significant (0) dominance-by-environment or dominance-by-population interaction |

`WALDSTATISTICS` = variates |
Saves the Wald test statistics |

`PRWALD` = variates |
Saves the associated Wald probabilities |

### Description

`QMBACKSELECT`

selects QTLs by backward selection from a list of candidate QTLs (loci) in multi-environment trials. Alternatively, it can analyse data from multiple populations. It uses means per genotype-environment or genotype-population combinations as phenotypic data, but weights can be attached to the means (see the `UNITERROR`

parameter and the `UNITFACTOR`

option below). The response variable must be specified by the `TRAIT`

parameter, and the corresponding environment and genotype factors must be specified by the `ENVIRONMENTS`

and `GENOTYPES`

parameters, respectively. The `POPULATIONTYPE`

option must be set to specify the population from which the genotypes have been derived. For a multiple-population analysis, the `POPULATIONS`

parameter should be set (to a factor) instead of `ENVIRONMENTS`

.

Molecular information must be provided in the form of additive genetic predictors stored in variates and supplied, in a pointer, by the `ADDITIVEPREDICTORS`

parameter. Non-additive effects can be included in the model by specifying dominance genetic predictors using the `DOMINANCEPREDICTORS`

parameter (e.g. in a F2 population). In the case of segregating F1 populations (outbreeders) two sets of additive genetic predictors must be specified, the maternal ones by the `ADDITIVEPREDICTORS`

parameter, and the paternal ones by the `ADD2PREDICTORS`

parameter. The corresponding map information for the genetic predictors must be given by the `CHROMOSOMES`

and `POSITIONS`

parameters. The labels for the loci can be supplied by the `IDLOCI`

parameter, and the labels for the genotypes in the marker data can be supplied by the `IDMGENOTYPES`

parameter. If `IDMGENOTYPES`

is set, the match between the genotypes in the phenotypic and in the marker data will be checked.

The set of candidate QTLs must be supplied by the `QTLCANDIDATES`

parameter. The model assumes `ENVIRONMENTS`

(or `POPULATIONS`

) as a fixed term, and `GENOTYPES`

as a random term. Extra fixed effects can be defined using the `FIXED`

option. A multi-Normal distribution is assumed for the random genetic effects, with mean vector 0 and variance-covariance matrix Σ. The `VCMODEL`

option defines the model to use for Σ. See the `VGESELECT`

procedure for details of the available models; the default is to use compound symmetry for multi-environment trials, and diagonal for multiple populations. Initial values for the parameters in the variance-covariance model can be specified by the `VCINITIAL`

parameter. The `VCPARAMETERS`

option controls whether the variance-covariance parameters are re-estimated at each step of the backward selection (`VCPARAMETERS=estimate`

), or whether they are fixed at the defined initial values (`VCPARAMETERS=fix`

). The `VCSELECT`

option defines whether an extra check is made at each step on the variance-covariance model, to assess whether a simpler model is more suitable than the current model (based on the criterion defined by the `CRITERION`

option). The `SELECTEDMODEL`

parameter stores the final variance-covariance model that is selected. The significance level to use at each step of the backward selection process is given by the `ALPHALEVEL`

option (default 0.05).

The `MVINCLUDE`

, `MAXCYCLE`

and `WORKSPACE`

options operate in the same way as these options of the `REML`

directive. The `UNITERROR`

parameter allows uncertainty on the trait means (derived from individual unit or plot error) to be specified to include in the random model; by default this is omitted. The `UNITFACTOR`

option allows the factor that is needed to define the unit-error term to be saved (this would be needed, for example, to save information later about the term using `VKEEP`

).

The `PRINT`

option specifies the output to be displayed. The `summary`

setting prints the information about the QTLs retained in the model, and the other settings correspond to those in the `PRINT`

option of the `REML`

directive.

The list of selected QTLs can be saved by the `QTLSELECTED`

parameter, and a logical variate that indicates whether the selected QTL showed a significant QTL-by-environment (or QTL-by-population) interaction can be saved by the `INTERACTIONS`

parameter. This interaction is the combined effect of the `ADDITIVEPREDICTORS`

, `ADD2PREDICTORS`

and `DOMINANCEREDICTORS`

pointers if specified. After the final step of the backward selection, extra tests are performed if the `DOMINANCEPREDICTORS`

parameter is set. If the selected QTL has no interaction effect with environment (or population), a test is performed of whether the dominance effect has a significant contribution in the combined QTL effect. If dominance is significant, the corresponding units of the logical variate saved by the `DOMSELECTED`

parameter are set to one; the other units are set to zero. If the selected QTL has significant interaction with environment (or population), a test is performed of whether the dominance-by-environment (or dominance-by-population) interaction has a significant contribution in the combined QTL-by-environment (or QTL-by-population) interaction. If the dominance-by-environment (or dominance-by-population) interaction is significant, the corresponding units of the logical variate saved by `DOMINTERACTIONS`

parameter are set to one; the other units are set to zero. The Wald test and associated probability values for the combined effects (including the possible not-significant dominance and dominance-by-environment or dominance-by-population interactions) of the selected QTLs can be saved by the `WALDSTATISTICS`

and `PRWALD`

parameters, respectively.

Options: `PRINT`

, `POPULATIONTYPE`

, `ALPHALEVEL`

, `VCMODEL`

, `VCPARAMETERS`

, `VCSELECT`

, `CRITERION`

, `FIXED`

, `UNITFACTOR`

, `MVINCLUDE`

, `MAXCYCLE`

, `WORKSPACE`

.

Parameters: `TRAIT`

, `GENOTYPES`

, `ENVIRONMENTS`

, `POPULATIONS`

, `UNITERROR`

, `VCINITIAL`

, `SELECTEDMODEL`

, `ADDITIVEPREDICTORS`

, `ADD2PREDICTORS`

, `DOMINANCEPREDICTORS`

, `CHROMOSOMES`

, `POSITIONS`

, `IDLOCI`

, `IDMGENOTYPES`

, `QTLCANDIDATES`

, `QTLSELECTED`

, `INTERACTIONS`

, `DOMSELECTED`

, `DOMINTERACTIONS`

, `WALDSTATISTICS`

, `PRWALD`

.

### Method

`QMBACKSELECT`

starts with the following mixed models, which include a set *L* of candidate QTLs:

1) *y _{ij}* =

*μ*+

*E*+ Σ

_{j}_{l∈L}

*x*

_{il}^{add}*α*+

_{jl}^{add}*GE*

_{ij}if only `ADDITIVEPREDICTORS`

are specified

2) *y _{ij}* =

*μ*+

*E*+ Σ

_{j}_{l∈L}(

*x*

_{il}^{add}*α*+

_{jl}^{add}*x*

_{il}^{dom}*α*) +

_{jl}^{dom}*GE*

_{ij}if `DOMINANCEPREDICTORS`

are also specified

3) *y _{ij}* =

*μ*+

*E*+ Σ

_{j}_{l∈L}(

*x*

_{il}^{add}*α*+

_{jl}^{add}*x*

_{il}^{add2}*α*+

_{jl}^{add2}*x*

_{il}^{dom}*α*) +

_{jl}^{dom}*GE*

_{ij}if both `ADD2PREDICTORS`

and `DOMINANCEPREDICTORS`

are specified (for population type `CP`

)

where *y _{ij}* is the trait value of genotype

*i*in environment (or population)

*j*,

*E*is the environment (or population) main effect,

_{j}*x*are the additive genetic predictors of genotype

_{il}^{add}*i*for locus

*l*, and

*α*are the associated effects. In models 2 and 3,

_{jl}^{add}*x*are the dominance genetic predictors, and

_{il}^{dom}*α*are the associated effects. In model 3,

_{jl}^{dom}*x*are the additive genetic predictors for maternal genotype

_{il}^{add}*i*at locus

*l*,

*x*are the additive genetic predictors for paternal genotype

_{il}^{add2}*i*, and

*α*and

_{jl}^{add}*α*are the associated effects. Genetic predictors are genotypic covariables that reflect the genotypic composition of a genotype at a specific chromosome location (Lynch & Walsh 1998).

_{jl}^{add2}*GE*is assumed to follow a multi-Normal distribution with mean vector 0, and a variance covariance matrix Σ, that can either be modelled explicitly (with an unstructured model) or by some parsimonious model (defined by option

_{ij}`VCMODEL`

) as described in the `VGESELECT`

procedure.The backward selection procedure starts with the initial set of loci (defined by the `QTLCANDIDATES`

parameter), and checks whether all loci are significant. If not, the locus with the lowest Wald test statistic is dropped from the model. This process is repeated until all loci in the model are significant. The procedure then switches to test whether the remaining QTLs show significant QTL-by-environment (or QTL-by-population) interaction, by breaking down the QTL effects into QTL main effects and QTL-by-environment (or QTL-by-population) interaction effects. If the QTL-by-environment (or QTL-by-population) interaction term is not significant, only a main effect is retained in the model for the corresponding QTL.

### Action with `RESTRICT`

Restrictions are not allowed.

### Reference

Lynch, M. & Walsh, B. (1998). *Genetics and Analysis of Quantitative Traits*. Sinauer Associates, Sunderland, MA.

### See also

Procedures: `QMESTIMATE`

, `QMQTLSCAN`

, `QMVAF`

, `VGESELECT`

.

Commands for: Statistical genetics and QTL estimation.

### Example

CAPTION 'QMBACKSELECT example'; STYLE=meta SPLOAD [PRINT=*] '%GENDIR%/Examples/F2maize_traits.gsh' & '%GENDIR%/Examples/F2maizemarkers.GWB'; SHEET='LOCI' & '%GENDIR%/Examples/F2maizemarkers.GWB'; SHEET='ADDPREDICTORS' & '%GENDIR%/Examples/F2maizemarkers.GWB'; SHEET='DOMPREDICTORS' POINTER [MODIFY=yes; NVAL=idlocus] addpred POINTER [MODIFY=yes; NVAL=idlocus] dompred " Candidate QTL positions (peaks) from QMQTLSCAN" VARIATE [VALUES=14...22,40...42,236...240] Qid " Best variance-covariance model from VGESELECT " TEXT model; VALUE='fa' QMBACKSELECT [PRINT=summary,wald; POPULATIONTYPE=F2; ALPHA=0.05;\ VCMODEL=#model] TRAIT=yld; ENVIRONMENTS=E; GENOTYPES=G;\ ADDITIVEPREDICTORS=addpred; DOMINANCEPREDICTORS=dompred;\ QTLCANDIDATES=Qid; CHROMOSOMES=mkchr; POSITIONS=mkpos;\ IDLOCI=idlocus; QTLSELECTED=qtlsel; INTERACTIONS=qtlint;\ DOMSELECTED=domsel; DOMINTERACTIONS=domint; WALDSTAT=stat;\ PRWALD=prwald PRINT qtlsel,qtlint,domsel,domint; DECIMALS=0