1. Home
  2. QMATCH procedure

QMATCH procedure

Matches different data structures to be used in QTL estimation (L.C.P. Keizer & J.T.N.M. Thissen).

Options

PRINT = string tokens What to print (summary, details); default summ
GEN%MISSING = scalar Percentage of missing values allowed for a genotype; default 50
MK%MISSING = scalar Percentage of missing values allowed for a marker; default 50
MK%EXTREME = scalar Extreme allele percentage allowed for a marker; default 5
GENSELECTION = variate Logical variate containing the value one for the genotypes to retain and zero for those to remove (supersedes the options GEN%MISSING, MK%MISSING and MK%EXTREME)
MKSELECTION = variate Logical variate containing the value one for the markers to retain and zero for those to remove (supersedes the options GEN%MISSING, MK%MISSING and MK%EXTREME)
POPULATIONTYPE = string token Type of population (BC1, DH1, F2, RIL, BCxSy, CP, AMP); must be set
OUTFILEPREFIX = text Prefix for the output file names; default * i.e. files not saved

Parameters

TRAITS = pointers or variates Quantitative traits
GENOTYPES = factors Genotype factors corresponding to the traits
ENVIRONMENTS = factors Environment factors corresponding to the traits
MKSCORES = pointers Marker scores; must be set
CHROMOSOMES = factors Chromosomes corresponding to the markers
POSITIONS = variates Positions on the chromosomes corresponding to the markers
MKNAMES = texts Names of the markers
IDMGENOTYPES = texts Labels for the genotypes corresponding to the markers
PARENTS = pointers Parent information
IDPARENTS = texts Labels used to identify the parents
KMATRIX = symmetric matrices Kinship matrices containing coefficients of coancestries
SUBPOPULATIONS = factors Groups of genotypes
STRAITS = pointers or variates Saves the sorted quantitative traits
SGENOTYPES = factors Saves the sorted genotype factors
SENVIRONMENTS = factors Saves the sorted environment factors
SMKSCORES = pointers Saves the sorted marker scores; must be set
SCHROMOSOMES = factors Saves the sorted chromosomes corresponding to the markers
SPOSITIONS = variates Saves the sorted positions on the chromosomes corresponding to the markers
SMKNAMES = texts Saves the sorted names of the markers
SIDMGENOTYPES = texts Saves the sorted labels for the genotypes
SPARENTS = pointers Saves the sorted parent information
SIDPARENTS = texts Saves the sorted labels used to identify the parents
SKMATRIX = symmetric matrices Saves the sorted kinship matrices
SSUBPOPULATIONS = factors Saves the sorted groups of genotypes

Description

QMATCH matches the various data structures that can be used in QTL detection. These include molecular marker information of sets of genotypes, map information, phenotypic information, and also genetic relatedness information in the form of genotype groupings and kinship matrices. QMATCH can be used to align all these data for further analyses.

Molecular marker information is supplied by the MKSCORES, MKNAMES and IDMGENOTYPES parameters; MKSCORES must be set. The type of population from which the genotypes come must be specified using the POPULATIONTYPE option. If parental genotypes are known (designed crosses), the marker scores of the parents can be supplied by the PARENTS parameter, and their labels can be specified by the IDPARENTS parameter. Molecular map information is supplied by the CHROMOSOMES and POSITIONS parameters. Phenotypic data are specified by the TRAITS parameter, as a variate for a single trait, or as a pointer containing several variates for more than one trait. The GENOTYPES parameter supplies a factor defining the genotype of each trait observation, and the ENVIRONMENTS parameter can supply a factor defining the environment of each observation when the data are from a multi-environment trial. Genetic relatedness information, used in association mapping analyses, can be given as a kinship matrix using the KMATRIX parameter, or a grouping factor using the SUBPOPULATIONS parameter.

QMATCH matches the different data sets together, with respect to the same set of genotypes (MKSCORES and TRAITS), or the same set of markers (MKSCORES and the map structures). The non-common genotypes and/or markers are removed.

In addition to subsetting the data, the procedure can also be used to remove genotypes and/or markers with too many missing values. The GEN%MISSING option sets a threshold on the percentage of missing values within each genotype (default 50); genotypes with more than that percentage of missing scores are excluded. Similarly, the MK%MISSING option sets a threshold on the percentage of missing values within each marker (default 50); markers with more than that percentage of missing scores are excluded. This can also be done with the MK%EXTREME option; markers are then excluded if one allele percentage of that marker is greater than the MK%EXTREME value.

In some situations you may already know which markers or genotypes you want to remove. If so, you can set the GENSELECTION and MKSELECTION options (and the GEN%MISSING, MK%MISSING and MK%EXTREME options are then ignored). The setting of each option is a logical variate containing the value one for the genotypes or markers (respectively) to retain, and zero for those that are to be removed. If any of these two options is set, no checks are carried out using the GEN%MISSING, MK%MISSING and MK%EXTREME options.

The modified data structures can be saved using the parameters beginning with the prefix S. The SMKSCORES parameter, which must be set, saves the marker scores. If only the MKSCORES and SMKSCORES parameters are specified, the SMKSCORES variates are sorted according to the labels of the MKSCORES pointer. If the MKNAMES and/or the IDMGENOTYPES parameters are also specified, sorting is then done according to their values. If the map structures (CHROMOSOMES and POSITIONS) are also set, the SMKSCORES variates are first sorted in ascending order according to the levels of the CHROMOSOMES factor, and then within each chromosome (linkage group) in ascending order of the POSITIONS. If the SMKNAMES, SCHROMOSOMES, SPOSITIONS, SPARENTS and SIDPARENTS are set, their values are sorted in the same way. The structures corresponding to the traits (i.e. STRAITS, SGENOTYPES and SENVIRONMENTS) are sorted in the same way as the SIDMGENOTYPES text; if these structures contain values from more than one environment, the sorting according to the values of SIDMGENOTYPES is done within each environment. Finally, if the KMATRIX and/or the SUBPOPULATIONS parameters are set, their sorted values can be saved by the SKMATRIX and SSUBPOPULATIONS parameters, respectively.

The OUTFILEPREFIX option can be used to define the initial part of the names of files to save the modified data. The text supplied by the option should not contain an extension, as the extension is defined automatically for the different files. The saved marker scores are stored in a flapjack file with '_geno.txt' added to OUTFILEPREFIX, the saved map structures in a flapjack map file with '_map.txt' added, and the saved phenotypical structures in a Genstat spreadsheet file with '_pheno.gsh' added. The saved kinship matrix and the saved subpopulations structures are also stored in Genstat spreadsheet files, with '_kmat.gsh' and '_subpop.gsh' added, respectively.

The PRINT option controls the printed output, with settings:

    summary for a general summary of the changes, and
    details for details of the omitted genotypes and markers, etc.

Options: PRINT, GEN%MISSING, MK%MISSING, MK%EXTREME, GENSELECTION, MKSELECTION, POPULATIONTYPE, OUTFILEPREFIX.

Parameters: TRAITS, GENOTYPES, ENVIRONMENTS, MKSCORES, CHROMOSOMES, POSITIONS, MKNAMES, IDMGENOTYPES, PARENTS, IDPARENTS, KMATRIX, SUBPOPULATIONS, STRAITS, SGENOTYPES, SENVIRONMENTS, SMKSCORES, SCHROMOSOMES, SPOSITIONS, SMKNAMES, SIDMGENOTYPES, SPARENTS, SIDPARENTS, SKMATRIX, SSUBPOPULATIONS.

Action with RESTRICT

Restrictions are not allowed.

See also

Procedure: QMKDIAGNOSTICS.

Commands for: Statistical genetics and QTL estimation.

Example

CAPTION 'QMATCH example'; STYLE=meta
QIMPORT [POPULATION=AMP] '%GENDIR%/Examples/LD_example_geno.txt';\ 
        MAPFILE='%GENDIR%/Examples/LD_example_map.txt';\ 
        MKSCORES=mkscores; MKNAMES=mknames; CHROMOSOMES=mkchr;\
        POSITIONS=mkpos; IDMGENOTYPES=idmgeno
QMATCH  [POPULATION=AMP; OUTFILE='LD_match']\ 
        MKSCORES=mkscores; CHROMOSOMES=mkchr; POSITIONS=mkpos;\ 
        MKNAMES=mknames; IDMGENOTYPES=idmgeno; SMKSCORES=smkscores
Updated on March 6, 2019

Was this article helpful?