Imports genotypic and phenotypic data for QTL analysis (D.A. Murray).
Options
PRINT = string token |
What to print (catalogue , errorreport ); default cata , erro |
---|---|
POPULATIONTYPE = string token |
Type of population (BC1 , DH1 , F2 , RIL , BCxSy , CP , AMP ); must be set |
MISSING = text |
Character representing a missing genotype in Flapjack or R/QTL format; default '-' |
SEPARATOR = text |
Character separating data values in Flapjack format; default separates them by tabs |
ASEPARATOR = text |
Character separating allele values in Flapjack format; default '/' |
FJROWS = string token |
Specifies whether the genotypes or markers are stored on the rows in Flapjack format (genotypes , markers ); default geno |
NPARENTS = scalar |
Number of parents in Flapjack file; default 0 for population AMP, 4 for CP, and 2 otherwise |
NMKERROR = scalar |
For data in Flapjack format, this sets a limit on the number of markers that may be found to contain errors before the import is abandoned; default 200 |
MKREMOVE = string token |
Whether to remove markers with errors in Flapjack format automatically (yes , no ); default no |
Parameters
FILENAME = texts |
Name of the file for import |
---|---|
MAPFILENAME = texts |
Name of the map file (Flapjack or MapQTL(R)) |
PHEFILENAME = texts |
Name of the phenotypic file (MapQTL(R)) |
MKSCORES = pointers |
Saves the genotype codes for each marker |
TRAITS = pointers |
Saves the trait data from the phenotypic file |
CHROMOSOMES = factors |
Saves linkage groups for each marker |
POSITIONS = variates |
Saves positions of the markers within linkage groups |
MKNAMES = texts |
Saves the marker names |
MKSETS = factors |
Saves marker sets |
IDMGENOTYPES = texts |
Labels for genotypes |
PARENTS = pointers |
Saves the parent information |
IDPARENTS = texts |
Saves the labels used to identify the parents |
IDFILENAME = texts |
Specifies a file containing genotype labels for MapQTL(R) files; if unset, they are assumed to be in the .loc file |
EXCLUDEMARKERS = texts |
Specifies the names of any markers to exclude from an import in Flapjack format |
MKERRORS = texts |
In Flapjack format, this saves the names of any markers that contain errors |
ERRORLOCATIONS = pointers |
In Flapjack format, this saves a pointer to texts that identify any errors in the marker-by-genotype (individual) scores |
OUTFILENAME = texts |
Specifies the name of a Genstat workbook (.gwb ) file to save the marker scores and associated information |
Description
QIMPORT
loads genotypic and phenotypic data for QTL analysis. The name of the genotypic data file to be imported is specified by the FILENAME
parameter. The format of the file to be imported is specified by the file extension, and can be either a Flapjack text genotype file (.txt
), a MapQTL(R) Locus genotype file (.loc
) or a comma-delimited text (.csv
). The format of the .csv
file is an extended R/QTL separate genotype data .csv
file format, which can include an extra column for the marker sets.
If a Flapjack genotype or MapQTL(R) Locus genotype file name is supplied, the associated map information can supplied by setting the MAPFILENAME
option to a file name with the extension .txt
for Flapjack or .map
for MapQTL(R). For Flapjack and R/QTL formats, the POPULATIONTYPE
option must be set to specify the population from which the genotypes come. For MapQTL(R), the population is determined from the .loc
file. The MISSING
option can specify a character to identify missing genotypes in Flapjack genotype files and R/QTL files. By default, Genstat expects the genotype data in Flapjack files to be tab-delimited, but the SEPARATOR
option can be used to specify an alternative separator. Similarly, by default, Genstat expects the alleles for each genotype to be separated using a '/'
character, but an alternative can be supplied using the ASEPARATOR
option. For the Flapjack genotype format, the FJROWS
option indicates whether the genotypes or markers are stored in the rows of the file; by default the genotypes are in the rows.
The marker scores for the genotypes are stored in a set of factors in the pointer supplied by the MKSCORES
parameter. Each factor within the pointer will contain data for a marker, with factor labels supplied in the same order.
When importing genotypic data the linkage groups for each marker, marker names and positions are saved using the CHROMOSOMES
, MKNAMES
and POSITIONS
parameters, respectively. If a .csv
file is imported, any marker sets within the file can be saved using the MKSETS
parameter. The grouping factor identifying marker sets in a .csv
file can be saved using the MKSETS
parameter.
For BC1
, DH1
, F2
, RIL
, BCxSy
and CP
populations, the parent information and associated names can be saved using the PARENT
and IDPARENTS
parameters respectively.
The genotype labels can be saved using the IDMGENOTYPES
parameter. By default, for MapQTL(R) locus and map files, the genotype labels are the values 1 to n. However, Genstat allows individual names to be included at the bottom of the locus file, below the genotype data. The file should then include the instruction
Individual names:
followed by each individual name on a separate line in the same order as that in which the genotypes are specified for each locus. Alternatively, a text file containing the genotype labels can be supplied using the IDFILENAME
parameter; each individual name should then be on a separate line in the same order as that in which the genotypes are specified for each locus in the .loc
file.
For data in Flapjack format, markers can be excluded by setting the EXCLUDEMARKERS
parameter to a text containing the names of the markers to omit. When importing Flapjack genotypic data, the parental and individual scores are checked for errors. You can set option MKREMOVE=yes
to remove any markers that are found to contain errors, automatically from the imported data. The NMKERROR
option sets a limit on the number of markers that may be found to contain errors before the import is abandoned; default 200. The names of any markers that containing errors in the parent or individual genotype scores can be saved, in a text, using the MKERROR
parameter. The ERRORLOCATIONS
parameter can save a pointer containing a text with marker names and a text with genotype names, identifying the marker × genotype locations of any marker score errors.
The PRINT
option specifies the output to be displayed, with settings:
catalogue |
produces a summary listing attributes of the data that have been read and, for phenotypic data, a list of the data structures that have been imported, |
---|---|
errorreport |
gives a report of any errors in genotypic data that have been read in Flapjack format. |
Phenotypic data in MapQTL(R) quantatitive data files (.qua
) can be imported by supplying the name of the file with the PHEFILENAME
parameter. The TRAITS
parameter can be set to a pointer to store the identifiers (i.e. column names) read from the file. The pointer can then be used to refer to the variates containing the loaded data.
The OUTFILENAME
can specify the name of a Genstat workbook (.gwb
) file to save the marker scores and associated information.
Options: PRINT
, POPULATIONTYPE
, MISSING
, SEPARATOR
, ASEPARATOR
. FJROWS
, NPARENTS
, NMKERROR
, MKREMOVE
.
Parameters: FILENAME
, MAPFILENAME
, PHEFILENAME
, MKSCORES
, TRAITS
, CHROMOSOMES
, POSITIONS
, MKNAMES
, MKSETS
, IDMGENOTYPES
, PARENTS
, IDPARENTS
, IDFILENAME
, EXCLUDEMARKERS
, MKERRORS
, ERRORLOCATIONS
, OUTFILENAME
.
Method
See the QEXPORT
procedure for further details of the file formats. Data in Flapjack format are read and checked using the Dataload
dll, and the valid data are passed back to Genstat using temporary files.
See also
Procedures: IMPORT
, QEXPORT
, QIBDPROBABILITIES
.
Commands for: Statistical genetics and QTL estimation.
Example
CAPTION 'QIMPORT example'; STYLE=meta QIMPORT [POPULATION=F2]\ FILENAME='%GENDIR%/Examples/F2maize_geno.txt';\ MAPFILENAME='%GENDIR%/Examples/F2maize_map.txt';\ MKSCORES=mgenotypes; CHROMOSOMES=linkagegroups;\ POSITIONS=lpos; MKNAMES=markers;\ PARENTS=parents; IDPARENTS=idparents DQMAP CHROMOSOMES=linkagegroups; POSITIONS=lpos; MKNAMES=markers DQMKSCORES [POPULATIONTYPE=F2; PLOT=all] mgenotypes;\ CHROMOSOMES=linkagegroups; PARENTS=parents; IDPARENTS=idparents