QMASSOCIATION procedure

Performs multi-environment marker-trait association analysis in a genetically diverse population using bi-allelic and multi-allelic markers (M. Malosetti & J.T.N.M. Thissen).

Options

`PRINT` = string tokens	What to print (`summary`, `progress`); default `summ`
`PLOT` = string tokens	What to plot (`profile`, `map`); default `prof`, `map`
`RELATIONSHIPMODEL` = string token	What model to use to account for genetic relatedness (`eigenanalysis`, `subpopulations`, `null`); default `eige`
`VCMODEL` = string token	Specifies the variance-covariance model for the set of environments (`identity`, `diagonal`, `cs`, `hcs`, `outside`, `fa`, `unstructured`, `best`); default `best`
`CRITERION` = string token	Defines which criterion is used to compare the different covariance structures (`aic`, `sic`); default `sic`
`MINORALLELE` = scalar	Frequency of minor alleles; default 0.05
`THRESHOLD` = scalar	Threshold value for significant LD, on the -log10 scale; default 2
`SUBPOPULATIONS` = factor	Defines groupings of genotypes into subpopulations
`MODELPART` = string token	Defines which part of the model should include `SUBPOPULATIONS` if `RELATIONSHIPMODEL` is set to `subpopulations`, or the principal components scores if `RELATIONSHIPMODEL` is set to `eigenanalysis` (`fixed`, `random`); default `rand`
`SCALING` = string token	Whether to scale the scores by the square roots of their singular values if `RELEATIONSHIPMODEL` is set to `eigenanalysis` (`singularvalues`, `none`); default `sing`
`STANDARDIZE` = string token	Whether to standardize the marker scores according to their frequencies (`frequency`, `none`); default `freq`
`TITLE` = text	General title for the plots
`YTITLE` = text	Title for the y-axis
`XTITLE` = text	Title for the x-axis

Parameters

`TRAIT` = variates	Phenotypic trait to analyse; must be set
`GENOTYPES` = factors	Genotype factor; must be set
`ENVIRONMENTS` = factors	Environment factor; must be set
`MKSCORES` = pointers	Genotype codes for each marker; must be set
`CHROMOSOMES` = factors	Linkage groups for the markers; must be set
`POSITIONS` = variates	Positions within the linkage groups of markers; must be set
`MKNAMES` = texts	Marker names
`WALDSTATISTICS` = variates	Saves the Wald test statistics
`NDF` = variates	Saves the degrees of freedom associated to the Wald test
`MINLOG10P` = variates	Saves the associated probability values of the Wald test statistics, on a -log10 scale
`QSAVE` = pointers	Saves a pointer with information and results for the significant effects
`DFILENAME` = texts	Name of the graphics file for the plots.

Description

QMASSOCIATION performs a mixed model marker-trait association analysis (also known as linkage disequilibrium mapping) with data from a single-environment trial. When testing for marker-trait association in a genetically diverse population, it is necessary to account for population structure, which introduces non-independence between genotypes as a result of common genetic background. In addition, the multi-environment context requires to the definition of the variance covariance model to use for the random genetic effects in the different environments; this is specified by the VCMODEL option. The default is to fit all models and select the best one according to the criterion given by the CRITERION option, either the Schwarz Information Criterion (the default), or the Akaike Information Criterion.

The trait response variate is supplied by the TRAIT parameter, and the corresponding environment and genotype factors must be specified by the ENVIRONMENTS and GENOTYPES parameters, respectively. The marker scores are supplied in a pointer by the MKSCORES pointer. The length of the MKSCORES pointer must be equal to the number of markers, and each structure of the pointer must be a factor. The corresponding map information for the markers must be given by the CHROMOSOMES and POSITIONS parameters. Labels for the markers can be supplied by the MKNAMES parameter.

The model to account for genetic relatedness between genotypes is specified by the RELATIONSHIPMODEL option, with one of the following settings:

`eigenanalysis`	infers the underlying genetic substructure in the population by retaining the most significant principal components from the molecular marker matrix (Patterson et al. 2006) – the scores of the significant axes are used as covariables in the mixed model, which effectively is an approximation to the structuring of the genetic variance covariance matrix by a coefficient of coancestry matrix (kinship matrix);
`subpopulations`	includes a factor supplied by the `SUBPOPULATIONS` option in the mixed model; and
`null`	makes no correction for genetic relatedness.

By default RELATIONSHIPMODEL=eigenanalysis. The scores of the significant axes are then calculated by the QEIGENANALYSIS procedure. The STANDARDIZE and SCALING options control whether the MKSCORES factors are standardized and scaled.

The threshold for significant marker trait association (on a -log10 scale) is defined by the THRESHOLD option. The default value is 2.

The MINORALLELE option defines the frequency q below which alleles are considered rare. Rare alleles are automatically pooled together. Markers whose major frequency allele is greater than or equal to 1-q are considered close to fixation and are not used in the analysis.

The MODELPART option controls whether the principal components scores (if RELATIONSHIPMODEL=eigenanalysis) or the subpopulations factor (if RELATIONSHIPMODEL=subpopulations) are included as random or fixed terms (default random).

The PRINT option controls printed output, with settings:

`summary`	to print the list of markers with a significant association with the trait, and
`progress`	to monitor the progress of the analysis.

The default is PRINT=summary.

The PLOT option controls what graphs are produced, with settings:

`profile`	plots a genome wide profile of the -log10(P) of the test statistic, and
`map`	plots a map with the location of the detected significant markers, highlighting whether or not the marker showed significant interaction with the environment.

By default both are plotted. The TITLE option can be used to provide a title for the graph, and the YTITLE and XTITLE options can supply titles for the y- and x-axis, respectively. By default, the plot is sent to the screen. However, you can supply a file for the plot, using the DFILENAME parameter. You can discover the types of graphics file that are supported by running the command DHELP possible.

The Wald test statistics, their numbers of degrees of freedom and the associated probability values on a -log10 scale can be saved by the WALDSTATISTICS, NDF and MINLOG10P parameters, respectively. The QSAVE parameter can be used to save a pointer containing information and results for the significant markers. The elements of the pointer are labelled as follows to simplify their subsequent use:

`'procedure'`	stores the string `'QMASSOCIATION'` to indicate the source of the results,
`'index'`	index numbers of the significant markers,
`'mkname'`	marker names,
`'chromosomes'`	chromosomes,
`'positions'`	positions,
`'minlog10p'`	probability values on a -log10 scale,
`'nalleles'`	number of alleles,
`'interaction'`	an indicator of whether there was a significant interaction with the environment,
`'allele'`	label of the relevant allele,
`'frequency'`	allele frequencies,
`'effects'`	effects,
`'seeffects'`	standard errors of the effects, and
`'sed'`	mean, minimum and maximum standard error of differences of the effects.

The elements 'procedure', 'mkname' and 'interaction' are text structures; 'index', 'positions', 'minlog10P' and 'nalleles' are variates; 'allele', 'frequency', 'effects', 'seeffects' and 'sed' are pointers; 'chromosomes' is a factor.

Options: PRINT, PLOT, RELATIONSHIPMODEL, VCMODEL, CRITERION, MINORALLELE, THRESHOLD, SUBPOPULATIONS, MODELPART, SCALING, STANDARDIZE, TITLE, YTITLE, XTITLE.

Parameters: TRAIT, GENOTYPES, ENVIRONMENTS, MKSCORES, CHROMOSOMES, POSITIONS, MKNAMES, WALDSTATISTICS, NDF, MINLOG10P, QSAVE, DFILENAME.

Method

QMASSOCIATION performs a mixed model marker-trait association analysis, or LD mapping, in the context of multiple environments. Consequently, it requires two major aspects to be handled in the statistical model: first it needs to account for the heterogeneous genetic relatedness between individuals in the population (sometimes referred as “population structure”); and second it needs to model the genetic correlations between environments, since same the individuals are measured across environments.

Depending on the model settings, the model for marker trait association may included the following terms: an intercept μ, an environment main effect (E_j), the effects associated with k principal components (PCscore_ki), the effects of genotype groups (Group_k), the effects of the tested markers (MK) and their interactions with the environment, and the effects of genotypes (G_i) and their interactions with the environments.

The RELATIONSHIPMODEL option specifies which of the three possible models to use for the relatedness, and the MODELPART option controls whether these terms are treated as fixed or random.

Model	Fixed	Fixed or random	Fixed	Random
Eigenanalysis	μ + E_j +	Σ_i { PCscore_ki + (PCscore_ki.E_j } +	MK + MK.E_j +	G_i + G_i.E_j
Subpopulations	μ + E_j +	Group_k + Group_k.E_j +	MK + MK.E_j +	G_i + G_i.E_j
Null	μ + E_j +		MK + MK.E_j +	G_i + G_i.E_j

The next step is to define the variance-covariance model for the random genotype and genotype by environment interaction terms. The VCMODEL option allows you either to define a specific model or, with the best setting, to select the model automatically by fitting all possible models and choosing the best one using the Schwarz or Akaike information criterion.

A Wald test is then used for each marker, individually, to test the null hypothesis that its effect is zero in every environment. The most frequent allele is set as the reference level. This is done by removing the marker main effect from the model in the VCOMPONENTS statement, which means leaving only the term MK.E. (As a result, the term MK.E should not be interpreted as marker-by-environment interaction, but as marker-environment specific effects.) If the null hypothesis is rejected, a second test is performed to check whether the marker-by-environment interaction is significant. This is done by refitting the model, but this time including the marker main effect. If the marker-by-environment interaction is found to be non-significant, marker main effects are stored. Otherwise environment-specific marker effects are stored.

Action with `RESTRICT`

Restrictions are not allowed.

Reference

Patterson, N., Price, A.L., Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2, e190. doi:10.1371/journal.pgen.0020190

Example

CAPTION       'QMASSOCIATION example'; STYLE=meta
QIMPORT       [POPULATION=amp] '%GENDIR%/Examples/LDxE_example_geno.txt';\ 
              MAPFILE='%GENDIR%/Examples/LDxE_example_map.txt'; MKSCORES=mk ;\
              CHROMOSOMES=mkchr ; POSITIONS=mkpos; MKNAMES=mknames
IMPORT        [PRINT=*] '%GENDIR%/Examples/LDxE_example_pheno.csv'

QMASSOCIATION [PRINT=#,progr; RELATION=subpop; SUBPOPULATIONS=group;\ 
              MODEL=fixed; TITLE='groups F'; THRESHOLD=3; VCMODEL=cs]\ 
              TRAIT=yield; GENOTYPES=geno; ENVIRONMENTS=env;\ 
              MKSCORES=mk; CHROMOSOMES=mkchr; POSITIONS=mkpos;\ 
              MKNAMES=mknames; WALD=Wald_out; NDF=ndf_out; MINLOG10P=P_out;\
              QSAVE=summ_out
PRINT         summ_out

Updated on June 19, 2019

Was this article helpful?

Yes No