QMKSELECT procedure

Obtains a representative selection of markers by means of genetic distance sampling or genetic distance optimization (J. Jansen & J.T.N.M. Thissen).

Options

`PRINT` = string tokens	What to print (`summary`, `monitoring`); default `summ`
`NCLUSTERS` = scalar	The number of markers to be selected; must be set
`METHOD` = string token	Method to be used (`sampling`, `optimization`); default `samp`

Parameters

`MKNAMES` = texts	Names of the markers; must be set
`RECFREQUENCY` = symmetric matices	Input recombination frequencies matrix for each selection; must be set
`PRIORGROUPS` = factors	Defines prior groupings of the markers
`SELECTED` = variates	Logical variate indicating whether a marker is selected (1) as cluster centre or not (0)
`NEIGHBOURS` = variates	Saves the nearest cluster centres of the markers
`DISTANCES` = variates	Saves the distances of the markers to the nearest cluster centre
`SEED` = scalars	Seed for randomization at the start; default 0

Description

QMKSELECT selects a representative subset of markers using a matrix of recombination frequencies, provided by the RECFREQUENCY parameter.

The METHOD option specifies whether to use genetic distance sampling or genetic distance optimization, by setting it to one of the following settings:

`sampling`	genetic distance sampling using the method of Jansen & Van Hintum (2006), or
`optimization`	genetic distance optimization based on K-medoids cluster analysis (Kaufman & Rouseeuw 1990).

The default is METHOD=sampling.

The marker names must be supplied by the MKNAMES parameter, and the number of markers to be selected must be specified by the NCLUSTERS option. Prior information about the grouping of the markers can be supplied using the PRIORGROUPS factor.

The SEED parameter specifies the seed to use to randomize the markers at the start. The default value of zero continues an existing sequence, or (if none) initializes the seed automatically.

The marker selection can be saved by the SELECTED parameter, in a logical variate containing one for each marker selected as a cluster centre, and zero for the markers that are not selected. The NEIGHBOURS parameter saves the nearest cluster centre for each marker, and the DISTANCES parameter saves the distances of each marker to the nearest cluster centre.

The PRINT option controls the printed output, with settings:

`summary`	for a summary of the selection, and
`monitoring`	for monitoring information.

Options: PRINT, NCLUSTERS, METHOD.

Parameters: MKNAMES, RECFREQUENCY, PRIORGROUPS, SELECTED, NEIGHBOURS, DISTANCES, SEED.

Action with `RESTRICT`

Restrictions are not allowed.

References

Jansen, J. & Th.J.L. van Hintum (2006). Genetic distance sampling: a novel sampling method for obtaining core collections using genetic distances with an application to cultivated lettuce. Theor. Appl. Genet., 114, 421-428.

Kaufman, P. & P.J. Rousseuw (1990). Finding Groups in Data: an Introduction to Cluster Analysis. Wiley, New York.

Example

CAPTION         'QMKSELECT example'; STYLE=meta
QIMPORT         [POPULATION=F2] '%GENDIR%/Examples/F2maize_geno.txt';\ 
                MKSCORES=mkscores; MKNAMES=mknames; IDMGENOTYPES=idmgeno;\ 
                PARENTS=parents
GROUPS          mknames; FACTOR=markers
QRECOMBINATIONS [POPULATION=F2; METHOD=twopoint; TITLE='F2 maize']\ 
                MKSCORES=mkscores; MKNAMES=mknames; RECFREQUENCIES=recfreq;\ 
                PARENTS=parents
QMKSELECT       [PRINT=MONITORING,SUMMARY; NCLUSTERS=10; METHOD=sampling]\ 
                MKNAMES=mknames; RECFREQUENCIES=recfreq;\ 
                SELECTED=SELECTED; NEIGHBOURS=NN; DISTANCES=DISTNN;\ 
                SEED=12345

Updated on March 6, 2019

Was this article helpful?

Yes No