1. Home
  2. QMASSOCIATION procedure

QMASSOCIATION procedure

Performs multi-environment marker-trait association analysis in a genetically diverse population using bi-allelic and multi-allelic markers (M. Malosetti & J.T.N.M. Thissen).

Options

PRINT = string tokens What to print (summary, progress); default summ
PLOT = string tokens What to plot (profile, map); default prof, map
RELATIONSHIPMODEL = string token What model to use to account for genetic relatedness (eigenanalysis, subpopulations, null); default eige
VCMODEL = string token Specifies the variance-covariance model for the set of environments (identity, diagonal, cs, hcs, outside, fa, unstructured, best); default best
CRITERION = string token Defines which criterion is used to compare the different covariance structures (aic, sic); default sic
MINORALLELE = scalar Frequency of minor alleles; default 0.05
THRESHOLD = scalar Threshold value for significant LD, on the -log10 scale; default 2
SUBPOPULATIONS = factor Defines groupings of genotypes into subpopulations
MODELPART = string token Defines which part of the model should include SUBPOPULATIONS if RELATIONSHIPMODEL is set to subpopulations, or the principal components scores if RELATIONSHIPMODEL is set to eigenanalysis (fixed, random); default rand
SCALING = string token Whether to scale the scores by the square roots of their singular values if RELEATIONSHIPMODEL is set to eigenanalysis (singularvalues, none); default sing
STANDARDIZE = string token Whether to standardize the marker scores according to their frequencies (frequency, none); default freq
TITLE = text General title for the plots
YTITLE = text Title for the y-axis
XTITLE = text Title for the x-axis

Parameters

TRAIT = variates Phenotypic trait to analyse; must be set
GENOTYPES = factors Genotype factor; must be set
ENVIRONMENTS = factors Environment factor; must be set
MKSCORES = pointers Genotype codes for each marker; must be set
CHROMOSOMES = factors Linkage groups for the markers; must be set
POSITIONS = variates Positions within the linkage groups of markers; must be set
MKNAMES = texts Marker names
WALDSTATISTICS = variates Saves the Wald test statistics
NDF = variates Saves the degrees of freedom associated to the Wald test
MINLOG10P = variates Saves the associated probability values of the Wald test statistics, on a -log10 scale
QSAVE = pointers Saves a pointer with information and results for the significant effects
DFILENAME = texts Name of the graphics file for the plots.

Description

QMASSOCIATION performs a mixed model marker-trait association analysis (also known as linkage disequilibrium mapping) with data from a single-environment trial. When testing for marker-trait association in a genetically diverse population, it is necessary to account for population structure, which introduces non-independence between genotypes as a result of common genetic background. In addition, the multi-environment context requires to the definition of the variance covariance model to use for the random genetic effects in the different environments; this is specified by the VCMODEL option. The default is to fit all models and select the best one according to the criterion given by the CRITERION option, either the Schwarz Information Criterion (the default), or the Akaike Information Criterion.

The trait response variate is supplied by the TRAIT parameter, and the corresponding environment and genotype factors must be specified by the ENVIRONMENTS and GENOTYPES parameters, respectively. The marker scores are supplied in a pointer by the MKSCORES pointer. The length of the MKSCORES pointer must be equal to the number of markers, and each structure of the pointer must be a factor. The corresponding map information for the markers must be given by the CHROMOSOMES and POSITIONS parameters. Labels for the markers can be supplied by the MKNAMES parameter.

The model to account for genetic relatedness between genotypes is specified by the RELATIONSHIPMODEL option, with one of the following settings:

    eigenanalysis infers the underlying genetic substructure in the population by retaining the most significant principal components from the molecular marker matrix (Patterson et al. 2006) – the scores of the significant axes are used as covariables in the mixed model, which effectively is an approximation to the structuring of the genetic variance covariance matrix by a coefficient of coancestry matrix (kinship matrix);
    subpopulations includes a factor supplied by the SUBPOPULATIONS option in the mixed model; and
    null makes no correction for genetic relatedness.

By default RELATIONSHIPMODEL=eigenanalysis. The scores of the significant axes are then calculated by the QEIGENANALYSIS procedure. The STANDARDIZE and SCALING options control whether the MKSCORES factors are standardized and scaled.

The threshold for significant marker trait association (on a -log10 scale) is defined by the THRESHOLD option. The default value is 2.

The MINORALLELE option defines the frequency q below which alleles are considered rare. Rare alleles are automatically pooled together. Markers whose major frequency allele is greater than or equal to 1-q are considered close to fixation and are not used in the analysis.

The MODELPART option controls whether the principal components scores (if RELATIONSHIPMODEL=eigenanalysis) or the subpopulations factor (if RELATIONSHIPMODEL=subpopulations) are included as random or fixed terms (default random).

The PRINT option controls printed output, with settings:

    summary to print the list of markers with a significant association with the trait, and
    progress to monitor the progress of the analysis.

The default is PRINT=summary.

The PLOT option controls what graphs are produced, with settings:

    profile plots a genome wide profile of the -log10(P) of the test statistic, and
    map plots a map with the location of the detected significant markers, highlighting whether or not the marker showed significant interaction with the environment.

By default both are plotted. The TITLE option can be used to provide a title for the graph, and the YTITLE and XTITLE options can supply titles for the y- and x-axis, respectively. By default, the plot is sent to the screen. However, you can supply a file for the plot, using the DFILENAME parameter. You can discover the types of graphics file that are supported by running the command DHELP possible.

The Wald test statistics, their numbers of degrees of freedom and the associated probability values on a -log10 scale can be saved by the WALDSTATISTICS, NDF and MINLOG10P parameters, respectively. The QSAVE parameter can be used to save a pointer containing information and results for the significant markers. The elements of the pointer are labelled as follows to simplify their subsequent use:

    'procedure' stores the string 'QMASSOCIATION' to indicate the source of the results,
    'index' index numbers of the significant markers,
    'mkname' marker names,
    'chromosomes' chromosomes,
    'positions' positions,
    'minlog10p' probability values on a -log10 scale,
    'nalleles' number of alleles,
    'interaction' an indicator of whether there was a significant interaction with the environment,
    'allele' label of the relevant allele,
    'frequency' allele frequencies,
    'effects' effects,
    'seeffects' standard errors of the effects, and
    'sed' mean, minimum and maximum standard error of differences of the effects.

The elements 'procedure', 'mkname' and 'interaction' are text structures; 'index', 'positions', 'minlog10P' and 'nalleles' are variates; 'allele', 'frequency', 'effects', 'seeffects' and 'sed' are pointers; 'chromosomes' is a factor.

Options: PRINT, PLOT, RELATIONSHIPMODEL, VCMODEL, CRITERION, MINORALLELE, THRESHOLD, SUBPOPULATIONS, MODELPART, SCALING, STANDARDIZE, TITLE, YTITLE, XTITLE.

Parameters: TRAIT, GENOTYPES, ENVIRONMENTS, MKSCORES, CHROMOSOMES, POSITIONS, MKNAMES, WALDSTATISTICS, NDF, MINLOG10P, QSAVE, DFILENAME.

Method

QMASSOCIATION performs a mixed model marker-trait association analysis, or LD mapping, in the context of multiple environments. Consequently, it requires two major aspects to be handled in the statistical model: first it needs to account for the heterogeneous genetic relatedness between individuals in the population (sometimes referred as “population structure”); and second it needs to model the genetic correlations between environments, since same the individuals are measured across environments.

Depending on the model settings, the model for marker trait association may included the following terms: an intercept μ, an environment main effect (Ej), the effects associated with k principal components (PCscoreki), the effects of genotype groups (Groupk), the effects of the tested markers (MK) and their interactions with the environment, and the effects of genotypes (Gi) and their interactions with the environments.

The RELATIONSHIPMODEL option specifies which of the three possible models to use for the relatedness, and the MODELPART option controls whether these terms are treated as fixed or random.

Model Fixed Fixed or random Fixed Random
Eigenanalysis μ + Ej + Σi { PCscoreki + (PCscoreki.Ej } + MK + MK.Ej + Gi + Gi.Ej
Subpopulations μ + Ej + Groupk + Groupk.Ej + MK + MK.Ej + Gi + Gi.Ej
Null μ + Ej +   MK + MK.Ej + Gi + Gi.Ej

The next step is to define the variance-covariance model for the random genotype and genotype by environment interaction terms. The VCMODEL option allows you either to define a specific model or, with the best setting, to select the model automatically by fitting all possible models and choosing the best one using the Schwarz or Akaike information criterion.

A Wald test is then used for each marker, individually, to test the null hypothesis that its effect is zero in every environment. The most frequent allele is set as the reference level. This is done by removing the marker main effect from the model in the VCOMPONENTS statement, which means leaving only the term MK.E. (As a result, the term MK.E should not be interpreted as marker-by-environment interaction, but as marker-environment specific effects.) If the null hypothesis is rejected, a second test is performed to check whether the marker-by-environment interaction is significant. This is done by refitting the model, but this time including the marker main effect. If the marker-by-environment interaction is found to be non-significant, marker main effects are stored. Otherwise environment-specific marker effects are stored.

Action with RESTRICT

Restrictions are not allowed.

Reference

Patterson, N., Price, A.L., Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2, e190. doi:10.1371/journal.pgen.0020190

See also

Procedures: QEIGENANALYSIS, QLDDECAY, QSASSOCIATION, QREPORT.

Commands for: Statistical genetics and QTL estimation.

Example

CAPTION       'QMASSOCIATION example'; STYLE=meta
QIMPORT       [POPULATION=amp] '%GENDIR%/Examples/LDxE_example_geno.txt';\ 
              MAPFILE='%GENDIR%/Examples/LDxE_example_map.txt'; MKSCORES=mk ;\
              CHROMOSOMES=mkchr ; POSITIONS=mkpos; MKNAMES=mknames
IMPORT        [PRINT=*] '%GENDIR%/Examples/LDxE_example_pheno.csv'

QMASSOCIATION [PRINT=#,progr; RELATION=subpop; SUBPOPULATIONS=group;\ 
              MODEL=fixed; TITLE='groups F'; THRESHOLD=3; VCMODEL=cs]\ 
              TRAIT=yield; GENOTYPES=geno; ENVIRONMENTS=env;\ 
              MKSCORES=mk; CHROMOSOMES=mkchr; POSITIONS=mkpos;\ 
              MKNAMES=mknames; WALD=Wald_out; NDF=ndf_out; MINLOG10P=P_out;\
              QSAVE=summ_out
PRINT         summ_out
Updated on June 19, 2019

Was this article helpful?