1. Home
  2. QIMPORT procedure

QIMPORT procedure

Imports genotypic and phenotypic data for QTL analysis (D.A. Murray).

Options

PRINT = string token What to print (catalogue, errorreport); default cata, erro
POPULATIONTYPE = string token Type of population (BC1, DH1, F2, RIL, BCxSy, CP, AMP); must be set
MISSING = text Character representing a missing genotype in Flapjack or R/QTL format; default '-'
SEPARATOR = text Character separating data values in Flapjack format; default separates them by tabs
ASEPARATOR = text Character separating allele values in Flapjack format; default '/'
FJROWS = string token Specifies whether the genotypes or markers are stored on the rows in Flapjack format (genotypes, markers); default geno
NPARENTS = scalar Number of parents in Flapjack file; default 0 for population AMP, 4 for CP, and 2 otherwise
NMKERROR = scalar For data in Flapjack format, this sets a limit on the number of markers that may be found to contain errors before the import is abandoned; default 200
MKREMOVE = string token Whether to remove markers with errors in Flapjack format automatically (yes, no); default no

Parameters

FILENAME = texts Name of the file for import
MAPFILENAME = texts Name of the map file (Flapjack or MapQTL(R))
PHEFILENAME = texts Name of the phenotypic file (MapQTL(R))
MKSCORES = pointers Saves the genotype codes for each marker
TRAITS = pointers Saves the trait data from the phenotypic file
CHROMOSOMES = factors Saves linkage groups for each marker
POSITIONS = variates Saves positions of the markers within linkage groups
MKNAMES = texts Saves the marker names
MKSETS = factors Saves marker sets
IDMGENOTYPES = texts Labels for genotypes
PARENTS = pointers Saves the parent information
IDPARENTS = texts Saves the labels used to identify the parents
IDFILENAME = texts Specifies a file containing genotype labels for MapQTL(R) files; if unset, they are assumed to be in the .loc file
EXCLUDEMARKERS = texts Specifies the names of any markers to exclude from an import in Flapjack format
MKERRORS = texts In Flapjack format, this saves the names of any markers that contain errors
ERRORLOCATIONS = pointers In Flapjack format, this saves a pointer to texts that identify any errors in the marker-by-genotype (individual) scores
OUTFILENAME = texts Specifies the name of a Genstat workbook (.gwb) file to save the marker scores and associated information

Description

QIMPORT loads genotypic and phenotypic data for QTL analysis. The name of the genotypic data file to be imported is specified by the FILENAME parameter. The format of the file to be imported is specified by the file extension, and can be either a Flapjack text genotype file (.txt), a MapQTL(R) Locus genotype file (.loc) or a comma-delimited text (.csv). The format of the .csv file is an extended R/QTL separate genotype data .csv file format, which can include an extra column for the marker sets.

If a Flapjack genotype or MapQTL(R) Locus genotype file name is supplied, the associated map information can supplied by setting the MAPFILENAME option to a file name with the extension .txt for Flapjack or .map for MapQTL(R). For Flapjack and R/QTL formats, the POPULATIONTYPE option must be set to specify the population from which the genotypes come. For MapQTL(R), the population is determined from the .loc file. The MISSING option can specify a character to identify missing genotypes in Flapjack genotype files and R/QTL files. By default, Genstat expects the genotype data in Flapjack files to be tab-delimited, but the SEPARATOR option can be used to specify an alternative separator. Similarly, by default, Genstat expects the alleles for each genotype to be separated using a '/' character, but an alternative can be supplied using the ASEPARATOR option. For the Flapjack genotype format, the FJROWS option indicates whether the genotypes or markers are stored in the rows of the file; by default the genotypes are in the rows.

The marker scores for the genotypes are stored in a set of factors in the pointer supplied by the MKSCORES parameter. Each factor within the pointer will contain data for a marker, with factor labels supplied in the same order.

When importing genotypic data the linkage groups for each marker, marker names and positions are saved using the CHROMOSOMES, MKNAMES and POSITIONS parameters, respectively. If a .csv file is imported, any marker sets within the file can be saved using the MKSETS parameter. The grouping factor identifying marker sets in a .csv file can be saved using the MKSETS parameter.

For BC1, DH1, F2, RIL, BCxSy and CP populations, the parent information and associated names can be saved using the PARENT and IDPARENTS parameters respectively.

The genotype labels can be saved using the IDMGENOTYPES parameter. By default, for MapQTL(R) locus and map files, the genotype labels are the values 1 to n. However, Genstat allows individual names to be included at the bottom of the locus file, below the genotype data. The file should then include the instruction

Individual names:

followed by each individual name on a separate line in the same order as that in which the genotypes are specified for each locus. Alternatively, a text file containing the genotype labels can be supplied using the IDFILENAME parameter; each individual name should then be on a separate line in the same order as that in which the genotypes are specified for each locus in the .loc file.

For data in Flapjack format, markers can be excluded by setting the EXCLUDEMARKERS parameter to a text containing the names of the markers to omit. When importing Flapjack genotypic data, the parental and individual scores are checked for errors. You can set option MKREMOVE=yes to remove any markers that are found to contain errors, automatically from the imported data. The NMKERROR option sets a limit on the number of markers that may be found to contain errors before the import is abandoned; default 200. The names of any markers that containing errors in the parent or individual genotype scores can be saved, in a text, using the MKERROR parameter. The ERRORLOCATIONS parameter can save a pointer containing a text with marker names and a text with genotype names, identifying the marker × genotype locations of any marker score errors.

The PRINT option specifies the output to be displayed, with settings:

    catalogue produces a summary listing attributes of the data that have been read and, for phenotypic data, a list of the data structures that have been imported,
    errorreport gives a report of any errors in genotypic data that have been read in Flapjack format.

Phenotypic data in MapQTL(R) quantatitive data files (.qua) can be imported by supplying the name of the file with the PHEFILENAME parameter. The TRAITS parameter can be set to a pointer to store the identifiers (i.e. column names) read from the file. The pointer can then be used to refer to the variates containing the loaded data.

The OUTFILENAME can specify the name of a Genstat workbook (.gwb) file to save the marker scores and associated information.

Options: PRINT, POPULATIONTYPE, MISSING, SEPARATOR, ASEPARATOR. FJROWS, NPARENTS, NMKERROR, MKREMOVE.

Parameters: FILENAME, MAPFILENAME, PHEFILENAME, MKSCORES, TRAITS, CHROMOSOMES, POSITIONS, MKNAMES, MKSETS, IDMGENOTYPES, PARENTS, IDPARENTS, IDFILENAME, EXCLUDEMARKERS, MKERRORS, ERRORLOCATIONS, OUTFILENAME.

Method

See the QEXPORT procedure for further details of the file formats. Data in Flapjack format are read and checked using the Dataload dll, and the valid data are passed back to Genstat using temporary files.

See also

Procedures: IMPORT, QEXPORT, QIBDPROBABILITIES.

Commands for: Statistical genetics and QTL estimation.

Example

CAPTION    'QIMPORT example'; STYLE=meta
QIMPORT    [POPULATION=F2]\ 
           FILENAME='%GENDIR%/Examples/F2maize_geno.txt';\
           MAPFILENAME='%GENDIR%/Examples/F2maize_map.txt';\
           MKSCORES=mgenotypes; CHROMOSOMES=linkagegroups;\
           POSITIONS=lpos; MKNAMES=markers;\
           PARENTS=parents; IDPARENTS=idparents
DQMAP      CHROMOSOMES=linkagegroups; POSITIONS=lpos; MKNAMES=markers
DQMKSCORES [POPULATIONTYPE=F2; PLOT=all] mgenotypes;\ 
           CHROMOSOMES=linkagegroups; PARENTS=parents; IDPARENTS=idparents
Updated on March 6, 2019

Was this article helpful?