Produces genomic predictions (breeding values) of tested and untested individuals using phenotypic information of the tested set and the whole population genetic relationships, as inferred from molecular marker information (M. Malosetti, M.P. Boer & S.J. Welham).
|What to print (
||What to plot (
||Model to use to obtain the predictions (
||Values to use for the tuning parameter θ when the model is Gaussian or exponential|
||Similarity matrix between individuals of the whole population|
||Quantitative trait to be analysed; must be set|
||Genotype factor; must be set|
||Labels of the tested and untested genotypes|
||Saves the predictions|
||Factor to index the predictions|
||Factor that classifies
In genomic prediction (or genomic selection as introduced by Meuwissen et al. 2001), molecular markers of individuals of a population are used in combination with phenotypic information of a subset of that population (tested set) to obtain predictions (breeding values) of all the individuals of the population (i.e. both tested and untested).
GPREDICTION can be used to obtain predictions by one of three different mixed models, according to the setting of the
MODELTYPE option. These differ according to the way in which the genetic variance covariance matrix is defined. The default setting,
gblup, uses a realised additive relationship matrix calculated from markers, which is equivalent to the inclusion of all the markers as random explanatory variables in the model (with a common variance component). Alternatively, with the
gaussian setting, a Gaussian kernel is used to model the genetic variance-covariance, which effectively accounts for non additive relationships (Gianola & van Kamp 2008, Piepho 2009). Finally, with the
exponential setting, an exponential kernel is used. For the Gaussian and exponential models, an extra (tuning) parameter θ is required, which determines how covariance between individuals decays in relation to distance in the genetic space. Values for θ can be supplied, in a variate, using the
THETA option. If this is unset, the value suggested by Crossa et al. (2010) is used (see the Method section). The
SIMILARITY option can be used either to provide a similarity matrix, or to store the one that is calculated using the markers.
TRAIT parameter must supply the observations (phenotypes) of the tested genotypes, and the
GENOTYPES parameter must supply a factor to identify individuals within the tested set. The
MKSCORES parameter supplies the marker scores of all the individuals in the population (tested and untested), and the
IDMGENOTYPES parameter provides labels for all the genotypes in the population (tested and untested).
MKSCORES must be set unless a relationship matrix has been supplied by the
SIMILARITY option. The
PREDICTIONS parameter can save the predictions, the
NEWGENOTYPES parameter can save a factor identifying each individual in the population, and the
TESTED parameter can save a factor classifying individuals as being part of the tested or untested set.
You can set
PRINT=summary to print a summary of the analysis. The
SAVE parameter can save a pointer containing save structures from
REML analyses that have been done.
PLOT option controls the graphs that are produced, with settings:
||for a scatter plot of predictions versus observed values of the tested set, and|
|pco||for a plot showing the first three axes of a principal coordinates analysis of the genetic similarities estimated from markers, to enable you to assess the coverage of the genetic space of the population given by the training set|
The prediction model is:
y = X β + Z u + ε
with u a vector of random genetic effects,
u ~ N(0, A σu2),
and residuals ε with
ε ~ N(0, I σ2).
The relationship matrix A is obtained from molecular marker information and formed depending of the model as:
|GBLUP||A = Z Z′||Z is the genotype by markers matrix|
|Gaussian||A = exp(-D2 / θ)||D2 is the Euclidean squared distance between individuals based on markers, and θ is a tuning parameter|
|Exponential||A = exp(-D / θ)||D is the Euclidean distance between individuals based on markers, and θ is a tuning parameter|
Before fitting the mixed model, the matrix A is checked to ensure that it is positive-semi definite. If not procedure
POSSEMIDEFINITE is called to produce a positive semi-definite approximation to be used instead. If one value is set for θ, a mixed model is fitted for each value, and the Akaike Information Coefficient is used to select the best one. If no value is given for θ, then
θ = median(D2) / 2
is used, as suggested by Crossa et al. (2010).
After fitting the mixed model, predictions are formed using the
Restrictions are not allowed.
Crossa, J., De Los Campos, G., Pérez, P., Gianola, D., Burgueño, J., Araus, J.L., Makumbi, D., Singh, R.P., Dreisigacker, S., Yan, J., Arief, V., Banziger, M. & Braun, H.J. (2010), Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics, 186,713-724.
Gianola, D. & van Kaam, J.B.C.H.M. (2008). Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics, 178, 2289-2303.
Meuwissen, T.H.E., Hayes, B.J. & Goddard, M.E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157, 1819-1829.
Piepho, H.P. (2009). Ridge regression and extensions for genome wide selection in maize. Crop Science, 49,1165-1176.
Commands for: Statistical genetics and QTL estimation.
CAPTION 'GPREDICTION example','A data set with bi-allelic markers';\ STYLE=meta,plain QIMPORT [POPULATION=amp]\ '%GENDIR%/Examples/dataCrossa_et_al2010_geno.txt';\ MAPFILE='%GENDIR%/Examples/dataCrossa_et_al2010_map.txt';\ MKSCORES=mk; CHROMOSOMES=mkchr; POSITIONS=mkpos; MKNAMES=mknames;\ IDMGENOTYPES=geno_id IMPORT [PRINT=*]\ '%GENDIR%/Examples/DataCrossa_et_al2010_Phenotypes.csv';\ ISAVE=vars " Model: GBLUP, relationship matrix is calculated and saved." GPREDICTION [MODEL=gblup; PLOT=scatterplot,pco; SIMILARITY=Kmat] TRAIT=yld;\ GENOTYPES=Geno; MKSCORES=mk; IDMGENOTYPES=geno_id;\ PREDICTIONS=p_GBLUP; NEWGENOTYPES=Genopred; TESTED=set; SAVE=res " Model: Gaussian kernel, with range of values for theta and relationship matrix estimated from markers." VARIATE [VALUES=0.25,0.3...0.5] theta GPREDICTION [MODEL=expo; PLOT=scatterplot,pco; SIMILARITY=KmatExp; THETA=theta]\ TRAIT=yld; GENOTYPES=Geno; MKSCORES=mk; IDMGENOTYPES=geno_id;\ PREDICTIONS=p_Gauss; NEWGENOTYPES=Genopred2; TESTED=set2; SAVE=res2 " Model: EXPONENTIAL kernel, with range of values for theta and relationship matrix estimated from markers." VARIATE [VALUES=0.05,0.1...0.5] theta GPREDICTION [MODEL=exp; PLOT=scatterplot,pco; SIMILARITY=KmatExp; THETA=theta]\ TRAIT=yld; GENOTYPES=Geno; MKSCORES=mk; IDMGENOTYPES=geno_id;\ PREDICTIONS=p_Exp; NEWGENOTYPES=Genopred3; TESTED=set3; SAVE=res3