Performs multi-environment marker-trait association analysis in a genetically diverse population using bi-allelic and multi-allelic markers (M. Malosetti & J.T.N.M. Thissen).
Options
PRINT = string tokens |
What to print (summary , progress ); default summ |
---|---|
PLOT = string tokens |
What to plot (profile , map ); default prof , map |
RELATIONSHIPMODEL = string token |
What model to use to account for genetic relatedness (eigenanalysis , subpopulations , null ); default eige |
VCMODEL = string token |
Specifies the variance-covariance model for the set of environments (identity , diagonal , cs , hcs , outside , fa , unstructured , best ); default best |
CRITERION = string token |
Defines which criterion is used to compare the different covariance structures (aic , sic ); default sic |
MINORALLELE = scalar |
Frequency of minor alleles; default 0.05 |
THRESHOLD = scalar |
Threshold value for significant LD, on the -log10 scale; default 2 |
SUBPOPULATIONS = factor |
Defines groupings of genotypes into subpopulations |
MODELPART = string token |
Defines which part of the model should include SUBPOPULATIONS if RELATIONSHIPMODEL is set to subpopulations , or the principal components scores if RELATIONSHIPMODEL is set to eigenanalysis (fixed , random ); default rand |
SCALING = string token |
Whether to scale the scores by the square roots of their singular values if RELEATIONSHIPMODEL is set to eigenanalysis (singularvalues , none ); default sing |
STANDARDIZE = string token |
Whether to standardize the marker scores according to their frequencies (frequency , none ); default freq |
TITLE = text |
General title for the plots |
YTITLE = text |
Title for the y-axis |
XTITLE = text |
Title for the x-axis |
Parameters
TRAIT = variates |
Phenotypic trait to analyse; must be set |
---|---|
GENOTYPES = factors |
Genotype factor; must be set |
ENVIRONMENTS = factors |
Environment factor; must be set |
MKSCORES = pointers |
Genotype codes for each marker; must be set |
CHROMOSOMES = factors |
Linkage groups for the markers; must be set |
POSITIONS = variates |
Positions within the linkage groups of markers; must be set |
MKNAMES = texts |
Marker names |
WALDSTATISTICS = variates |
Saves the Wald test statistics |
NDF = variates |
Saves the degrees of freedom associated to the Wald test |
MINLOG10P = variates |
Saves the associated probability values of the Wald test statistics, on a -log10 scale |
QSAVE = pointers |
Saves a pointer with information and results for the significant effects |
DFILENAME = texts |
Name of the graphics file for the plots. |
Description
QMASSOCIATION
performs a mixed model marker-trait association analysis (also known as linkage disequilibrium mapping) with data from a single-environment trial. When testing for marker-trait association in a genetically diverse population, it is necessary to account for population structure, which introduces non-independence between genotypes as a result of common genetic background. In addition, the multi-environment context requires to the definition of the variance covariance model to use for the random genetic effects in the different environments; this is specified by the VCMODEL
option. The default is to fit all models and select the best one according to the criterion given by the CRITERION
option, either the Schwarz Information Criterion (the default), or the Akaike Information Criterion.
The trait response variate is supplied by the TRAIT
parameter, and the corresponding environment and genotype factors must be specified by the ENVIRONMENTS
and GENOTYPES
parameters, respectively. The marker scores are supplied in a pointer by the MKSCORES
pointer. The length of the MKSCORES
pointer must be equal to the number of markers, and each structure of the pointer must be a factor. The corresponding map information for the markers must be given by the CHROMOSOMES
and POSITIONS
parameters. Labels for the markers can be supplied by the MKNAMES
parameter.
The model to account for genetic relatedness between genotypes is specified by the RELATIONSHIPMODEL
option, with one of the following settings:
eigenanalysis |
infers the underlying genetic substructure in the population by retaining the most significant principal components from the molecular marker matrix (Patterson et al. 2006) – the scores of the significant axes are used as covariables in the mixed model, which effectively is an approximation to the structuring of the genetic variance covariance matrix by a coefficient of coancestry matrix (kinship matrix); |
---|---|
subpopulations |
includes a factor supplied by the SUBPOPULATIONS option in the mixed model; and |
null |
makes no correction for genetic relatedness. |
By default RELATIONSHIPMODEL=eigenanalysis
. The scores of the significant axes are then calculated by the QEIGENANALYSIS
procedure. The STANDARDIZE
and SCALING
options control whether the MKSCORES
factors are standardized and scaled.
The threshold for significant marker trait association (on a -log10 scale) is defined by the THRESHOLD
option. The default value is 2.
The MINORALLELE
option defines the frequency q below which alleles are considered rare. Rare alleles are automatically pooled together. Markers whose major frequency allele is greater than or equal to 1-q are considered close to fixation and are not used in the analysis.
The MODELPART
option controls whether the principal components scores (if RELATIONSHIPMODEL=eigenanalysis
) or the subpopulations factor (if RELATIONSHIPMODEL=subpopulations
) are included as random or fixed terms (default random).
The PRINT
option controls printed output, with settings:
summary |
to print the list of markers with a significant association with the trait, and |
---|---|
progress |
to monitor the progress of the analysis. |
The default is PRINT=summary
.
The PLOT
option controls what graphs are produced, with settings:
profile |
plots a genome wide profile of the -log10(P) of the test statistic, and |
---|---|
map |
plots a map with the location of the detected significant markers, highlighting whether or not the marker showed significant interaction with the environment. |
By default both are plotted. The TITLE
option can be used to provide a title for the graph, and the YTITLE
and XTITLE
options can supply titles for the y- and x-axis, respectively. By default, the plot is sent to the screen. However, you can supply a file for the plot, using the DFILENAME
parameter. You can discover the types of graphics file that are supported by running the command DHELP
possible
.
The Wald test statistics, their numbers of degrees of freedom and the associated probability values on a -log10 scale can be saved by the WALDSTATISTICS
, NDF
and MINLOG10P
parameters, respectively. The QSAVE
parameter can be used to save a pointer containing information and results for the significant markers. The elements of the pointer are labelled as follows to simplify their subsequent use:
'procedure' |
stores the string 'QMASSOCIATION' to indicate the source of the results, |
---|---|
'index' |
index numbers of the significant markers, |
'mkname' |
marker names, |
'chromosomes' |
chromosomes, |
'positions' |
positions, |
'minlog10p' |
probability values on a -log10 scale, |
'nalleles' |
number of alleles, |
'interaction' |
an indicator of whether there was a significant interaction with the environment, |
'allele' |
label of the relevant allele, |
'frequency' |
allele frequencies, |
'effects' |
effects, |
'seeffects' |
standard errors of the effects, and |
'sed' |
mean, minimum and maximum standard error of differences of the effects. |
The elements 'procedure'
, 'mkname'
and 'interaction'
are text structures; 'index'
, 'positions'
, 'minlog10P'
and 'nalleles'
are variates; 'allele'
, 'frequency'
, 'effects'
, 'seeffects'
and 'sed'
are pointers; 'chromosomes'
is a factor.
Options: PRINT
, PLOT
, RELATIONSHIPMODEL
, VCMODEL
, CRITERION
, MINORALLELE
, THRESHOLD
, SUBPOPULATIONS
, MODELPART
, SCALING
, STANDARDIZE
, TITLE
, YTITLE
, XTITLE
.
Parameters: TRAIT
, GENOTYPES
, ENVIRONMENTS
, MKSCORES
, CHROMOSOMES
, POSITIONS
, MKNAMES
, WALDSTATISTICS
, NDF
, MINLOG10P
, QSAVE
, DFILENAME
.
Method
QMASSOCIATION
performs a mixed model marker-trait association analysis, or LD mapping, in the context of multiple environments. Consequently, it requires two major aspects to be handled in the statistical model: first it needs to account for the heterogeneous genetic relatedness between individuals in the population (sometimes referred as “population structure”); and second it needs to model the genetic correlations between environments, since same the individuals are measured across environments.
Depending on the model settings, the model for marker trait association may included the following terms: an intercept μ, an environment main effect (Ej), the effects associated with k principal components (PCscoreki), the effects of genotype groups (Groupk), the effects of the tested markers (MK) and their interactions with the environment, and the effects of genotypes (Gi) and their interactions with the environments.
The RELATIONSHIPMODEL
option specifies which of the three possible models to use for the relatedness, and the MODELPART
option controls whether these terms are treated as fixed or random.
Model | Fixed | Fixed or random | Fixed | Random |
Eigenanalysis | μ + Ej + | Σi { PCscoreki + (PCscoreki.Ej } + | MK + MK.Ej + | Gi + Gi.Ej |
Subpopulations | μ + Ej + | Groupk + Groupk.Ej + | MK + MK.Ej + | Gi + Gi.Ej |
Null | μ + Ej + | MK + MK.Ej + | Gi + Gi.Ej |
The next step is to define the variance-covariance model for the random genotype and genotype by environment interaction terms. The VCMODEL
option allows you either to define a specific model or, with the best
setting, to select the model automatically by fitting all possible models and choosing the best one using the Schwarz or Akaike information criterion.
A Wald test is then used for each marker, individually, to test the null hypothesis that its effect is zero in every environment. The most frequent allele is set as the reference level. This is done by removing the marker main effect from the model in the VCOMPONENTS
statement, which means leaving only the term MK.E
. (As a result, the term MK.E
should not be interpreted as marker-by-environment interaction, but as marker-environment specific effects.) If the null hypothesis is rejected, a second test is performed to check whether the marker-by-environment interaction is significant. This is done by refitting the model, but this time including the marker main effect. If the marker-by-environment interaction is found to be non-significant, marker main effects are stored. Otherwise environment-specific marker effects are stored.
Action with RESTRICT
Restrictions are not allowed.
Reference
Patterson, N., Price, A.L., Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2, e190. doi:10.1371/journal.pgen.0020190
See also
Procedures: QEIGENANALYSIS
, QLDDECAY
, QSASSOCIATION
, QREPORT
.
Commands for: Statistical genetics and QTL estimation.
Example
CAPTION 'QMASSOCIATION example'; STYLE=meta QIMPORT [POPULATION=amp] '%GENDIR%/Examples/LDxE_example_geno.txt';\ MAPFILE='%GENDIR%/Examples/LDxE_example_map.txt'; MKSCORES=mk ;\ CHROMOSOMES=mkchr ; POSITIONS=mkpos; MKNAMES=mknames IMPORT [PRINT=*] '%GENDIR%/Examples/LDxE_example_pheno.csv' QMASSOCIATION [PRINT=#,progr; RELATION=subpop; SUBPOPULATIONS=group;\ MODEL=fixed; TITLE='groups F'; THRESHOLD=3; VCMODEL=cs]\ TRAIT=yield; GENOTYPES=geno; ENVIRONMENTS=env;\ MKSCORES=mk; CHROMOSOMES=mkchr; POSITIONS=mkpos;\ MKNAMES=mknames; WALD=Wald_out; NDF=ndf_out; MINLOG10P=P_out;\ QSAVE=summ_out PRINT summ_out