Performs marker-trait association analysis in a genetically diverse population using bi-allelic and multi-allelic markers (M. Malosetti & J.T.N.M. Thissen).
|What to print (
||What to plot (
||What model to use to account for genetic relatedness (
||Provides the scores of significant principal components, obtained from an eigenvalue analysis|
||What model to use for GWAS (
||Defines a genome-wide significance level to calculate the threshold; default 0.05|
||Method to define the threshold for significance (
||Threshold value for significant LD, on the -log10 scale; default 2|
||Minimum distance gap between independent tests (i.e. distance beyond which loci are expected to be in linkeage equilibrium) when
||Frequency of minor alleles; default 0.05|
||Kinship matrix containing coefficients of coancestries|
||Method to use to estimate kinship matrix if not supplied by
||Defines groupings of genotypes into subpopulations|
||Defines which part of the model should include
||Whether to scale the scores by the square roots of their singular values (
||Whether to standardize the marker scores according to their frequencies (
||Colours to use for the chromosomes; default
||General title for the plots|
||Title for the y-axis|
||Title for the x-axis|
||Phenotypic trait to analyse; must be set|
||Genotype codes for each marker; must be set|
||Linkage groups for the markers; must be set|
||Positions within the linkage groups of markers; must be set|
||Labels for the genotypes corresponding to the markers|
||Name of a comma-delimited file (
||Name of a comma-delimited file (
||Saves the Wald test statistics|
||Saves the degrees of freedom associated with the Wald test|
||Saves the associated probability values of the Wald test statistics, on a -log10 scale|
||Saves the inflation factor i.e. slope of the QQ plot of -log10(P) values|
||Saves a pointer with information and results for the significant effects|
||Name of the graphical file for the plots|
QSASSOCIATION performs a mixed model marker-trait association analysis (also known as linkage disequilibrium mapping) with data from a single-environment trial. The trait data are supplied by the
TRAIT parameter. The marker scores can be supplied as a pointer of factors by the
MKSCORES parameter. The length of the pointer must be equal to the number of markers. Alternatively, if the fast method is requested by the
METHOD option, they can be supplied in a file whose name is specified by the
GENFILENAME parameter. The file must be comma-delimited (
*.csv), with the markers in the rows and the genotypes in the columns. The first column of the file contains marker names, and the first row of the file contains the names of the genotypes.
The corresponding map information for the markers can be supplied by the
POSITIONS parameters, and the labels for the markers can be supplied by the
MKNAMES parameter. The
IDMGENOTYPE parameter can be used to give the genotypes labels in the marker data. Alternatively, if the fast method is requested by the
METHOD option, the map information can be supplied in a file, whose name is specified by the
MAPFILE parameter. This file must also be comma-delimited (
*.csv), and should contain three columns (without headings): marker name, linkage group (chromosome), and position within linkage group of each marker.
To avoid false positives in association mapping studies, some form of control is necessary for the genetic relatedness. The model to use is specified by the
RELATIONSHIPMODEL option, with one of the following settings:
||infers the underlying genetic substructure in the population by retaining the most significant principal components from the molecular marker matrix (Patterson et al. 2006) – the scores of the significant axes are used as covariables in the mixed model, which effectively is an approximation to the structuring of the genetic variance covariance matrix by a coefficient of coancestry matrix (kinship matrix);|
||is the default model, and includes a kinship matrix in the mixed model;|
||includes a factor supplied by the
||makes no correction for genetic relatedness.|
RELATIONSHIPMODEL=kinship, the kinship matrix can be specified by the
KMATRIX option. Alternatively, it can be calculated from the
MKSCORES using the
QKINSHIPMATRIX procedure with the method specified by the
KMETHOD option (and can then be stored by
RELATIONSHIPMODEL=eigenanalysis, the scores of the significant axes can be supplied using the
SCORES option. Otherwise they are calculated by the
QEIGENANALYSIS procedure (and can then be stored by
SCALING options control whether the
MKSCORES factors are standardized and scaled; see
QEIGENANALYSIS for more details.
MODELPART option controls whether the principal components scores (if
RELATIONSHIPMODEL=eigenanalysis) or the subpopulations factor (if
RELATIONSHIPMODEL=subpopulations) are included as random or fixed terms (default random).
The threshold for significant marker trait association (on a -log10 scale) is defined by the
THRESHOLD option. The default value is 2.
MINORALLELE option defines the frequency q below which alleles are considered rare. Rare alleles are automatically pooled together. Markers whose major frequency allele is greater than or equal to 1-q are considered close to fixation and are not used in the analysis.
METHOD option defines the method to use to fit marker-trait association models, either exact or fast. For the exact method, the mixed models are solved for each marker separately. For the fast method, the mixed model is only solved for the genetic background model, without the markers in the model. The estimated variance-covariance matrix from this genetic background model is used to perform a generalized least squares scan for all the marker. The fast method is implemented only for bi-allelic markers, such as SNPs.
THRMETHOD option controls how the threshold for significance is defined. The default
THRMETHOD=neffective, first determines the effective number of columns (nC) in the marker matrix data using the estimator given by Patterson et al. (2006), and calculates the threshold as -log10(α/nC). The parameter α is the genome-wide type I error rate, which is defined by the
ALPHA option (default 0.05). Alternatively,
THRMETHOD=bonferroni calculates the effective number of tests assuming one independent test within blocks of a size specified by the
DISTANCE option. If
DISTANCE is not set, the default is to take an independent test at every marker, which is very conservative in most cases. Finally, if
THRMETHOD=given, a user-defined threshold value (on a log10 scale) must be specified using the
THRESHOLD option. With the other setting of
THRESHOLD can be used to save the estimated threshold.
MINORALLELE option defines the frequency q below which alleles are considered rare. Rare alleles are automatically pooled together. Markers whose major frequency allele are greater than or equal to 1-q are considered close to fixation, and are not used in the analysis.
||to print the list of markers with a significant association with the trait, and|
||to monitor the progress of the analysis.|
The default is
PLOT option controls what graphs are produced, with settings:
||plots a genome wide profile of the -log10(P) of the test statistic,|
||plots a map with the location of the detected significant markers, highlighting whether or not the marker showed significant interaction with the environment, and|
||makes a QQ plot of the -log10(P) values.|
TITLE option can be used to provide a title for the graph, and the
XTITLE options can supply titles for the y- and x-axis, respectively. The colours to use for the chromosomes in the upper graph are specified by the
COLOURS option using either a text of colour names or a variate of RGB values (see the
PEN directive for details). If
COLOURS is not set, the default is to use the default colours of the pens 1, 2, onwards, up to the number of chromosomes. By default, the plot
is sent to the screen. However, you can supply a file for the plot, using the
DFILENAME parameter. You can discover the types of graphics file that are supported by running the command
The Wald test statistics, their numbers of degrees of freedom and the associated probability values on a -log10 scale can be saved by the
MINLOG10P parameters, respectively. The
LAMBDA parameter can save inflation factor, estimated as the slope of the QQ plot of the –log10(P) values. The
QSAVE parameter can save a pointer containing information and results for the significant markers. The elements of the pointer are labelled as follows to simplify their subsequent use:
||stores the string
||index numbers of the significant markers,|
||probability values on a -log10 scale,|
||label of the relevant allele,|
||standard errors of the effects.|
These are all pointers, with an element for each chromosome. The elements of the chromosome pointers are variates for all components except the standard errors of differences, which are scalars.
QSASSOCIATION performs a mixed model marker-trait association analysis, or LD mapping. It takes account of the heterogeneous genetic relatedness between individuals in the population (sometimes referred as “population structure”) using one of three possible models, specified by the
RELATIONSHIPMODEL option, as defined below. The model for marker trait association may included the following terms: an intercept μ, the effects associated with k principal components PCscoreki (fixed or random), the effects of genotype groups Groupk (fixed or random) and the effects of the tested markers MK (fixed).
RELATIONSHIPMODEL option specifies which of the three possible models to use for the relatedness, and the
MODELPART option controls whether these terms are treated as fixed or random.
|Model||Fixed||Fixed or random||Fixed||Random|
|Eigenanalysis||μ +||Σi PCscoreki +||MK +||Gi|
|Kinship||μ +||MK +||Gi with G ~ N(0,2KσG)|
|Subpopulations||μ +||Groupk +||MK +||Gi|
|Null||μ +||MK +||Gi|
A Wald test is then used for each marker, individually, to test the null hypothesis that its effect is zero. The most frequent allele is set as the reference level. Marker allele frequencies, effects and standard errors are stored.
Restrictions are not allowed.
Patterson, N., Price, A.L., Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2, e190. doi:10.1371/journal.pgen.0020190
Commands for: Statistical genetics and QTL estimation.
CAPTION 'QSASSOCIATION example 1: a data set with bi-allelic markers';\ STYLE=meta QIMPORT [POPULATION=amp] '%GENDIR%/Examples/QAssociation_geno.txt';\ MAPFILE='%GENDIR%/Examples/QAssociation_map.txt'; MKSCORES=mk;\ CHROMOSOMES=mkchr; POSITIONS=mkpos; MKNAMES=mknames;\ IDMGENOTYPES=geno_id IMPORT [PRINT=*] '%GENDIR%/Examples/QAssociation_pheno.csv'; ISAVE=vars " The relationship model is kinship, with QKINSHIPMATRIX is used to estimate the K matrix. The threshold is defined as 2.5." QSASSOCIATION [PRINT=summary,progress; PLOT=profile,map,QQ;\ RELATIONSHIPMODEL=kinship; METHOD=fast;\ THRMETHOD=given; THRESHOLD=2.5; DISTANCE=15; MINORALLELE=0.1;\ KMATRIX=K; KMETHOD=dice] yield; GENOTYPE=genotypes; MKSCORES=mk;\ CHROMOSOMES=mkchr; POSITIONS=mkpos; IDMGENOTYPES=geno_id;\ MKNAMES=mknames; MINLOG10P=PvalK; LAMBDA=inf_factorK;\ QSAVE=outputKFast PRINT outputKFast PRINT outputKFast[3...6]; FIELD=18 PRINT outputKFast,outputKFast[7,8][1,2],outputKFast[9,10]; FIELD=10 CAPTION 'QSASSOCIATION example 2: a data set with multi-allelic markers';\ STYLE=meta DELETE [REDEFINE=yes] mk,mkchr,mkpos,mknames,geno_id,vars,K QIMPORT [POPULATION=amp] '%GENDIR%/Examples/LD_example_geno.txt';\ MAPFILE='%GENDIR%/Examples/LD_example_map.txt'; MKSCORES=mk;\ CHROMOSOMES=mkchr; POSITIONS=mkpos; MKNAMES=mknames;\ IDMGENOTYPES=geno_id IMPORT [PRINT=*] '%GENDIR%/Examples/LD_example_pheno.csv'; ISAVE=vars " The relationship model is eigenanalysis, with QEIGENANALYSIS used to calculate the principal component scores. The threshold is calculated and saved in scalar Thr2." QSASSOCIATION [PRINT=summary,progress; PLOT=profile,map,QQ; MINORALLELE=0.05;\ ALPHA=0.05; THRMETHOD=neff; THRESHOLD=Thr2; METHOD=exact;\ RELATIONSHIPMODEL=eigenanalysis; SCORES=PCscores; SCALING=none;\ STANDARDIZE=frequency] gy_th; GENOTYPE=geno; MKSCORES=mk;\ CHROMOSOMES=mkchr; POSITIONS=mkpos; MKNAMES=mknames;\ IDMGENOTYPES=geno_id; MINLOG10P=PvalE; LAMBDA=IF_E; QSAVE=outputE PRINT outputE PRINT outputE[3...6] ; FIELD=18 PRINT outputE,outputE[8,9][1,2],outputE[10,11]; FIELD=10 CAPTION !t('QSASSOCIATION Example 3: a large data set (10000 markers)',\ 'with bi-allelic markers.'); STYLE=meta DELETE [REDEFINE=yes] mk,mkchr,mkpos,mknames,geno_id,vars,PCscores GET [WORKINGDIRECTORY=wdir] %CD '%GENDIR%/Examples/' IMPORT [PRINT=*] 'data10000_pheno.csv' SPLOAD [PRINT=*] 'data10000_Kmat.gsh' QSASSOCIATION [PRINT=summary,progress; PLOT=profile,QQ; METHOD=fast;\ MINORALLELE=0.05; ALPHA=0.05; THRMETHOD=bonferroni;\ DISTANCE=1; THRESHOLD=Thr3; RELATIONSHIPMODEL=kinship;\ KMATRIX=Kmat] y001; GENOTYPE=genotype;\ GENFILENAME='data10000_geno.csv';\ MAPFILENAME='data10000_map.csv';\ MINLOG10P=Pval; LAMBDA=IF; QSAVE=output %CD wdir