1. Home
  2. QGSELECT procedure

QGSELECT procedure

Obtains a representative selection of genotypes by means of genetic distance sampling or genetic distance optimization (J. Jansen & J.T.N.M. Thissen).

Options

PRINT = string tokens What to print (summary, monitoring); default summ
NCLUSTERS = scalar The number of genotypes to be selected; must be set
METHOD = string token Method to be used (sampling, optimization); default samp

Parameters

GENOTYPES = factors Genotype factor; must be set
SIMILARITY = symmetric matices Input similarity matrix for each selection; must be set
PRIORGROUPS = factors Defines prior groupings of the genotypes
SELECTED = variates Logical variate indicating whether a genotype is selected (1) as cluster centre or not (0)
NEIGHBOURS = variates Saves the nearest cluster centres of the genotypes
DISTANCES = variates Saves the distances of the genotypes to the nearest cluster centre
SEED = scalars Seed for randomization at the start; default 0

Description

QGSELECT selects a representative subset of genotypes using a similarity matrix, provided by the SIMILARITY parameter.

The METHOD option specifies whether to use genetic distance sampling or genetic distance optimization, by setting it to one of the following settings:

    sampling genetic distance sampling using the method of Jansen & Van Hintum (2006), or
    optimization genetic distance optimization based on K-medoids cluster analysis (Kaufman & Rouseeuw 1990).

The default is METHOD=sampling.

The factor identifying the genotypes must be supplied by the GENOTYPES parameter, and the number of genotypes to be selected must be specified by the NCLUSTERS option. Prior information about the grouping of the genotypes can be supplied using the PRIORGROUPS factor.

The SEED parameter specifies the seed to use to randomize the genotypes at the start. The default value of zero continues an existing sequence, or (if none) initializes the seed automatically.

The genotype selection can be saved by the SELECTED parameter, in a logical variate containing one for each genotype selected as a cluster centre, and zero for the genotypes that are not selected. The NEIGHBOURS parameter saves the nearest cluster centre for each genotype, and the DISTANCES parameter saves the distances of each genotype to the nearest cluster centre.

The PRINT option controls the printed output, with settings:

    summary for a summary of the selection, and
    monitoring for monitoring information.

Options: PRINT, NCLUSTERS, METHOD.

Parameters: MKNAMES, SIMILARITY, PRIORGROUPS, SELECTED, NEIGHBOURS, DISTANCES, SEED.

Action with RESTRICT

Restrictions are not allowed.

References

Jansen, J. & Th.J.L. van Hintum (2006). Genetic distance sampling: a novel sampling method for obtaining core collections using genetic distances with an application to cultivated lettuce. Theor. Appl. Genet., 114, 421-428.

Kaufman, P. & P.J. Rousseuw (1990). Finding Groups in Data: an Introduction to Cluster Analysis. Wiley, New York.

See also

Procedure: QMKSELECT.

Commands for: Statistical genetics and QTL estimation.

Example

CAPTION  'QGSELECT example'; STYLE=meta
QIMPORT  [POPULATION=F2] '%GENDIR%/Examples/F2maize_geno.txt';\ 
         MKSCORES=mkscores; MKNAMES=mknames; IDMGENOTYPES=idmgeno
GROUPS   idmgeno; FACTOR=geno
QKINSHIP [METHOD=dice] mkscores; IDMGENOTYPES=idmgeno; KMATRIX=kmatrix
QGSELECT [PRINT=MONITORING, SUMMARY; NCLUSTERS=10; METHOD=sampling]\ 
         GENOTYPES=geno; SIMILARITY=kmatrix;\ 
         SELECTED=SELECTED; NEIGHBOURS=NN; DIST=DISTNN; SEED=12345
Updated on March 6, 2019

Was this article helpful?