1. Home
  2. GPREDICTION procedure

GPREDICTION procedure

Produces genomic predictions (breeding values) of tested and untested individuals using phenotypic information of the tested set and the whole population genetic relationships, as inferred from molecular marker information (M. Malosetti, M.P. Boer & S.J. Welham).

Options

PRINT = string token What to print (summary); default summ
PLOT = string token What to plot (scatterplot, pco); default scat, pco
MODELTYPE = string token Model to use to obtain the predictions (gblup, gaussian, exponential); default gblu
THETA = variate Values to use for the tuning parameter θ when the model is Gaussian or exponential
SIMILARITY = symmetric matrix Similarity matrix between individuals of the whole population

Parameters

TRAIT = variates Quantitative trait to be analysed; must be set
GENOTYPES = factors Genotype factor; must be set
MKSCORES = pointers Marker scores
IDMGENOTYPES = texts Labels of the tested and untested genotypes
PREDICTIONS = variates Saves the predictions
NEWGENOTYPES = factors Factor to index the predictions
TESTED = factors Factor that classifies NEWGENOTYPES as part of the tested or the untested set
SAVE = pointers Pointer to REML save structures to save details of the analyses

Description

In genomic prediction (or genomic selection as introduced by Meuwissen et al. 2001), molecular markers of individuals of a population are used in combination with phenotypic information of a subset of that population (tested set) to obtain predictions (breeding values) of all the individuals of the population (i.e. both tested and untested).

GPREDICTION can be used to obtain predictions by one of three different mixed models, according to the setting of the MODELTYPE option. These differ according to the way in which the genetic variance covariance matrix is defined. The default setting, gblup, uses a realised additive relationship matrix calculated from markers, which is equivalent to the inclusion of all the markers as random explanatory variables in the model (with a common variance component). Alternatively, with the gaussian setting, a Gaussian kernel is used to model the genetic variance-covariance, which effectively accounts for non additive relationships (Gianola & van Kamp 2008, Piepho 2009). Finally, with the exponential setting, an exponential kernel is used. For the Gaussian and exponential models, an extra (tuning) parameter θ is required, which determines how covariance between individuals decays in relation to distance in the genetic space. Values for θ can be supplied, in a variate, using the THETA option. If this is unset, the value suggested by Crossa et al. (2010) is used (see the Method section). The SIMILARITY option can be used either to provide a similarity matrix, or to store the one that is calculated using the markers.

The TRAIT parameter must supply the observations (phenotypes) of the tested genotypes, and the GENOTYPES parameter must supply a factor to identify individuals within the tested set. The MKSCORES parameter supplies the marker scores of all the individuals in the population (tested and untested), and the IDMGENOTYPES parameter provides labels for all the genotypes in the population (tested and untested). MKSCORES must be set unless a relationship matrix has been supplied by the SIMILARITY option. The PREDICTIONS parameter can save the predictions, the NEWGENOTYPES parameter can save a factor identifying each individual in the population, and the TESTED parameter can save a factor classifying individuals as being part of the tested or untested set.

You can set PRINT=summary to print a summary of the analysis. The SAVE parameter can save a pointer containing save structures from REML analyses that have been done.

The PLOT option controls the graphs that are produced, with settings:

    scatterplot for a scatter plot of predictions versus observed values of the tested set, and
    pco for a plot showing the first three axes of a principal coordinates analysis of the genetic similarities estimated from markers, to enable you to assess the coverage of the genetic space of the population given by the training set

Options: PRINT, PLOT, MODELTYPE, THETA, SIMILARITY.

Parameters: TRAIT, GENOTYPES, MKSCORES, IDMGENOTYPES, PREDICTIONS, NEWGENOTYPES, TESTED, SAVE.

Method

The prediction model is:

y = X β + Z u + ε

with u a vector of random genetic effects,

u ~ N(0, A σu2),

and residuals ε with

ε ~ N(0, I σ2).

The relationship matrix A is obtained from molecular marker information and formed depending of the model as:

Model Relationship matrix  
GBLUP A = Z Z Z is the genotype by markers matrix
Gaussian A = exp(-D2 / θ) D2 is the Euclidean squared distance between individuals based on markers, and θ is a tuning parameter
Exponential A = exp(-D / θ) D is the Euclidean distance between individuals based on markers, and θ is a tuning parameter

Before fitting the mixed model, the matrix A is checked to ensure that it is positive-semi definite. If not procedure POSSEMIDEFINITE is called to produce a positive semi-definite approximation to be used instead. If one value is set for θ, a mixed model is fitted for each value, and the Akaike Information Coefficient is used to select the best one. If no value is given for θ, then

θ = median(D2) / 2

is used, as suggested by Crossa et al. (2010).

After fitting the mixed model, predictions are formed using the VPREDICT directive.

Action with RESTRICT

Restrictions are not allowed.

References

Crossa, J., De Los Campos, G., Pérez, P., Gianola, D., Burgueño, J., Araus, J.L., Makumbi, D., Singh, R.P., Dreisigacker, S., Yan, J., Arief, V., Banziger, M. & Braun, H.J. (2010), Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics, 186,713-724.

Gianola, D. & van Kaam, J.B.C.H.M. (2008). Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics, 178, 2289-2303.

Meuwissen, T.H.E., Hayes, B.J. & Goddard, M.E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157, 1819-1829.

Piepho, H.P. (2009). Ridge regression and extensions for genome wide selection in maize. Crop Science, 49,1165-1176.

See also

Directives: REML, VPREDICT, PCO.

Commands for: Statistical genetics and QTL estimation.

Example

CAPTION     'GPREDICTION example','A data set with bi-allelic markers';\ 
            STYLE=meta,plain
QIMPORT     [POPULATION=amp]\
            '%GENDIR%/Examples/dataCrossa_et_al2010_geno.txt';\
            MAPFILE='%GENDIR%/Examples/dataCrossa_et_al2010_map.txt';\ 
            MKSCORES=mk; CHROMOSOMES=mkchr; POSITIONS=mkpos; MKNAMES=mknames;\
            IDMGENOTYPES=geno_id
IMPORT      [PRINT=*]\
            '%GENDIR%/Examples/DataCrossa_et_al2010_Phenotypes.csv';\
            ISAVE=vars
" Model: GBLUP, relationship matrix is calculated and saved."
GPREDICTION [MODEL=gblup; PLOT=scatterplot,pco; SIMILARITY=Kmat] TRAIT=yld;\
            GENOTYPES=Geno; MKSCORES=mk; IDMGENOTYPES=geno_id;\
            PREDICTIONS=p_GBLUP; NEWGENOTYPES=Genopred; TESTED=set; SAVE=res
" Model: Gaussian kernel, with range of values for theta
  and relationship matrix estimated from markers."
VARIATE     [VALUES=0.25,0.3...0.5] theta
GPREDICTION [MODEL=expo; PLOT=scatterplot,pco; SIMILARITY=KmatExp; THETA=theta]\
            TRAIT=yld; GENOTYPES=Geno; MKSCORES=mk; IDMGENOTYPES=geno_id;\ 
            PREDICTIONS=p_Gauss; NEWGENOTYPES=Genopred2; TESTED=set2; SAVE=res2
" Model: EXPONENTIAL kernel, with range of values for theta
  and relationship matrix estimated from markers."
VARIATE     [VALUES=0.05,0.1...0.5] theta
GPREDICTION [MODEL=exp; PLOT=scatterplot,pco; SIMILARITY=KmatExp; THETA=theta]\
            TRAIT=yld; GENOTYPES=Geno; MKSCORES=mk; IDMGENOTYPES=geno_id;\ 
            PREDICTIONS=p_Exp; NEWGENOTYPES=Genopred3; TESTED=set3; SAVE=res3
Updated on March 29, 2022

Was this article helpful?