1. Home
  2. Genotype and map file format

Genotype and map file format

The genotypic data for a QTL or association analysis are supplied using two files containing the marker scores and associated map data. The file formats used in Genstat are based on the Flapjack genotype and map data formats. In Genstat the marker scores are supplied using a genotype file and the associated map information using a map file. Both of these files are plain text files and details of the format of these files is detailed below.

Map file

The map file contains information of marker locations (linkage group and position within linkage group). The file consists of three columns without any headers (column names). Column 1 specifies the marker names, column 2 the chromosome on which the marker has been mapped, and column 3 indicates the position of the marker within the chromosome. The columns should be tab separated. When the map file is opened it is sorted by chromosome, position within chromosome and alphabetically for markers if they occur at the same position.

M1	1	0.0
M2	1	8.7
M3	1	10.7
M4	1	18.93
M5	1	19.17
M6	1	19.25
M7	1	20.11
M8	1	25.49
M9	1	27.24
...	...	...

Genotype file

The genotype file contains the marker scores of each individual in the population. The file is a genotype by marker matrix, with individuals in the rows and markers in the columns. The first column in the file holds the parent and genotype names. The first row contains the names of the markers. The information in the next rows depends on the population type. For F2, DH1, BC1, BCxSy and RIL populations the rows 2 and 3 in the file correspond to the parents of the cross. For a CP population rows 2,3,4 and 5 should contain the parent allele information. In an association mapping population no parental genotypes are supplied (as the founder genotypes are not known). The remaining rows in the file correspond to the genotypes of the population.

The marker genotypes are coded by two characters corresponding to the alleles with a separator between them (by default a slash /). If a single character is given, the genotype is assumed to be homozygous. Missing values are indicated using the ‘-‘ character, however they may use another character. The following example shows a genotype file that corresponds to an F2 population. In this example the two alleles have been called 1 and 2 (useful to link alleles to their origin, i.e. parent 1 or parent 2). Therefore, 1 corresponds to homozygous for allele 1 (synonymous to 1/1), 1/2 corresponds to heterozygous, and a 2 corresponds to homozygous for allele 2 (synonymous to 2/2).

Missing values are indicated by – (synonymous to -/-). In the case of partially informative markers (e.g. dominant markers) genotypes are coded as 1/- or 2/-, depending on whether the dominant allele originated from parent 1 or parent 2. In the example the first row contains the names of 9 markers. The parent allele information for the cross is supplied in rows 2 and 3 where PARENT_1 is the name for parent 1 and PARENT_2 is the name for parent 2. The genotypes are contained in the remaining rows where the names start with the prefix GENO_.

		M1 	M2 	M3 	M4 	M5 	M6 	M7 	M8	M9
PARENT_1 	1 	1 	1 	1 	1 	1 	1 	1 	1
PARENT_2 	2 	2 	2 	2 	2 	2 	2 	2 	2
GENO_001 	1 	1/- 	- 	1 	1 	1 	1/2 	2/- 	2
GENO_003 	2 	1/- 	2 	2	2 	1 	2 	1 	2
GENO_004 	1 	2 	1 	1 	1 	2 	1/2 	2/- 	1
GENO_005 	1/2 	1/- 	1 	1/2 	1 	1 	1/2 	2/- 	2
GENO_007 	1 	1/- 	1 	- 	1 	1 	2 	2/- 	1
GENO_008 	2 	2 	2 	2 	2 	1 	1 	1 	1
GENO_009 	- 	2 	2 	1/2 	2 	1 	2 	2/- 	2
GENO_010 	1 	1/- 	1/2 	1 	- 	1 	1 	1 	1
GENO_011 	1/2 	1/- 	2 	2 	2 	1 	2 	2/- 	2
GENO_012 	1 	2 	1/2 	1 	1 	1 	1 	1 	2
GENO_013 	1/2 	1/- 	1 	1/2 	1 	1 	1 	2/- 	2
GENO_014 	1 	2 	1 	1 	1 	2 	2 	2/- 	2

See also

Updated on April 25, 2019

Was this article helpful?