1. Home
  2. GGEBIPLOT procedure

GGEBIPLOT procedure

Watch our video on creating GGE biplots using the Genstat GUI.

Plots displays to assess genotype + genotype-by-environment variation (A.I. Glaser).

Options

PRINT = string tokens What to print (variation); default * i.e. nothing
DIMENSIONS = scalars Which dimensions to display; default 1,2
PLOT = string token Type of plot (scatter, ranking, compare, joint, centred); default scat
METHOD = string token Whether the names in LEV1 (and LEV2) are from the ENVIRONMENTS or GENOTYPES factor (environments, genotypes); default envi
SCPLOT = string token Features to add to a scatter plot (hull, sector, megaenvironment, vector, linear); default * i.e. none
SCALING = string tokens What scaling to use (genotype, environment, symmetric); default envi
NORMALIZE = string token Whether to scale the data using the within-environment standard deviation (yes, no); default no
CULL = variate or text Specifies environments at which to examine the performance of the genotypes in order to decide which genotypes to cull
QUANTILE = scalar Proportion at which to calculate quantile for CULL; default 0.5.
DIVISIONS = scalar Number of parallel lines or concentric circles to use when ranking genotypes or environments; default 10
RANKINGLINES = string token Whether the ranking lines drawn with PLOT settings ranking or joint are perpendicular to the biplot axis or projected onto the axis (perpendicular, projection); default perp
GENREVERSE = string token Whether to reverse the order of the genotype scores (yes, no); default no
ENVREVERSE = string token Whether to reverse the order of the environment scores (yes, no); default no
WINDOW = scalar Which graphical window to use; default 1
KEYWINDOW = scalar Window number for the key (zero for no key); default 2

Parameters

DATA = variates or tables Provides the data to be analysed
GENOTYPES = factors Specifies the genotypes
ENVIRONMENTS = factors Specifies the environments
LEV1 = texts or scalars First environment (or genotype) to use with PLOT settings centred, compare, joint or ranking, or with scatter when SCPLOT=linear
LEV2 = texts or scalars Second environment (or genotype) to use with PLOT settings centred, compare or joint
LABGENOTYPES = texts Labels for genotypes
LABENVIRONMENTS = texts Labels for environments
TITLE = texts Titles for the plots; if this is unset, an appropriate title is formed auomatically
MEGAGROUPS = variates or texts Specifies or saves the groupings to use for the plot produced by SCPLOT=megaenvironment

Description

GGEBIPLOT provides a range of plots that are useful for assessing the performance of genotypes in different environments. The observed phenotypic variation (P) of genotypes across environments is made up of environment variations (E), genotype variations (G) and genotype-by-environment interaction (GE): i.e.

P = E + G + GE,

Usually E is the dominant source of variation, while G and GE are relatively small. Thus, it is usual to remove the environmental main effect E, and focus only on G and GE.

The data for GGEBIPLOT is a table of data values, classified by genotype and environment factors, and specified by the DATA parameter. The genotype and environment factors are specified by the GENOTYPES and ENVIRONMENTS parameters. You can set DATA to the table itself. Alternately, you can set it to a variate containing the raw data, and GGEBIPLOT will form the table as a table of means.

GGEBIPLOT illustrates the genotype plus genotype-by-environment variation using scores from a principal components analysis, treating the table as a data matrix. The rows (or units) of the data matrix correspond to the genotypes, and the columns (or variates) correspond to the environments. The analysis works on the matrix of variances and covariances between environments. The environment means are automatically removed during the calculation of the variances and covariances. So the analysis automatically ensures that it is only the genotype variation and genotype-by-environment interaction that is examined. You can also scale the columns first, using the within-environment standard deviation, by setting option NORMALIZE=yes. Usually the scores are taken from the first two dimensions of the decomposition, but you can request others by setting the DIMENSIONS option. You can set option PRINT=variation to print the amount of variation explained by these two dimensions; by default, nothing is printed.

GGEBIPLOT plots the scores in a range of different ways, together with biplot axes from the principal components analysis. Essentially these are standard principal-component biplots, but various additional information can be added to the plots, as suggested in the book GGE Biplot Analysis by Yan & Kang (2003), to help elucidate the genotype and environment relationships.

The PLOT option controls the plots that are displayed. The setting scatter plots the genotype and environment scores. The SCPLOT option allows further information to be included on the plot, with settings:

    hull to draw an enclosing convex hull around the genotype scores;
    sector to draw lines from the origin perpendicular to each side of the convex hull around the genotype scores, to divide the biplot into sectors;
    megaenvironment to draw an ellipse round those environments which share the same sector;
    vector to draw lines connecting environment scores with the origin;
    linear to draw the same lines as vector, together with a rug plot at the side showing the angles between the environments, the parameter LEV1 must then be set to the label (or level) of an environmental factor which will be used as a “base” factor.

Note that hull, sector and megaenvironment can be used together, but vector and linear must be used individually. For single-trait data, genotypes at the vertex of the convex hull are considered to be the best performers in the environments that occur in the same sector (these are known as the vertex cultivars). The sector setting splits the plots into different sectors. The genotypes in the same sector as a particular environment should be those with higher yields in that environment. As a general rule, the vertex cultivar will be the highest-yielding genotype in all environments with which it shares a sector. The megaenvironment setting draws an ellipse around those environments which share a sector (if the ellipse extends into another sector and sector lines are plotted, the ellipse lines become dashed when they go into a different sector).

The MEGAGROUPS parameter can be used to specify or save the groups used for the megaenvironment setting. To specify the groups, you can set MEGAGROUPS to a variate or text with the same length as the number of levels of the ENVIRONMENTS factor; its values indicate the group to which each environment belongs. Alternatively, if MEGAGROUPS is set to an undefined data structure, or one with no values, this will be defined as a variate containing the default group definitions.

The PLOT setting ranking can examine the performances of all the genotypes within a specific environment. Alternatively, you can set option METHOD=genotype to examine all the environments for a specific genotype. This draws a biplot axis through the specific environment (or genotype) together with ranking lines to show the best performing genotypes (or environments) in that environment (or genotype). By default the ranking lines are drawn to be perpendicular to the biplot axis, but you can set option RANKINGLINES=projection to project lines from the environments (or genotypes) to the biplot axis instead. In the plot, the best performing genotypes (or environments) are those whose projections onto the biplot axis are closest to the environment or genotype). The required genotype (or environment) is specified by setting the parameter LEV1 to either the label or level of the required environment (or genotype). If LEV1 is unset or is set to a missing value, an axis is drawn through the “average environment coordinate” (AEC), with the appropriate ranking lines. The AEC is represented by a circle on the plot.

The PLOT setting compare can compare the performance of the environments with a specific environment, or you can set option METHOD=genotype to compare the genotypes with a specific genotype. The specific environment (or genotype) is viewed as an “ideal” environment (or genotype), and concentric circles are plotted around it. The closer an environment (or genotype) is to the “ideal” environment (or genotype) the more attributes they share. The required environment (or genotype) is specified by setting the parameter LEV1 to either the label or level of the required environment (or genotype). If LEV1 is unset or is set to a missing value, GGEBIPLOT constructs an “ideal” environment (or genotype), and draws concentric circles from its point. The constructed “ideal” environment (or genotype) lies on the line that joins the origin to the AEC, at a distance from the origin equal to the distance from the origin to the environment (or genotype) with the greatest yield. (The “ideal” environment or genotype considers only those environments or genotypes that show greater than average yield.) The “ideal” environment (or genotype) is represented by an arrow on the plot. In practice the “ideal” is unlikely to exist, but can be used as a reference point. It is also possible to see where the AEC is in relation to the “ideal” genotype (or environment) by setting LEV2 to a missing value.

The major difference between ranking and compare is that ranking shows the best performing environments (or genotypes) in a genotype (or environment) in a single dimension, whilst compare shows the best performing genotypes (or environments) in comparison to an “ideal” genotype (or environment) in two dimensions. The DIVISIONS option specifies the number of lines, or concentric circles, to use when ranking genotypes or environments with PLOT settings ranking or compare; the default is to use 10.

The PLOT setting joint can be used to compare two environments simultaneously, or you can set option METHOD=genotype to compare two genotypes. When comparing two environments, a line is drawn joining the environments. A median point on this line is found, which acts as a virtual trait. A biplot axis is plotted passing through this median and the origin. Ranking lines are also drawn to the biplot axis, as with the PLOT setting ranking; the RANKINGLINES option again controls whether these are perpendicular to the axis or projected onto the axis. The genotypes that are furthest along the biplot axis (in the direction of the arrow) are considered to be the best performing genotypes in the two environments. Alternatively, when comparing two genotypes, a line is drawn joining the genotypes. An axis is now drawn through the origin perpendicular to this joining line. The environments on the same side of the axis as one of the chosen genotypes are those where that genotype is considered to have a better performance. In some circumstances both genotypes may end up on the same side of the axis. The genotype that is closest to the axis is then considered to have a better performance in the environments on the other side of the perpendicular line. The two environments (or genotypes) are specified by setting LEV1 and LEV2 to their levels or labels.

The PLOT setting centred can produce a scatter plot of the environment-centred data, with the x and y-axes representing two of the environments. In this case only the genotypes are plotted. Alternatively, you can set METHOD=genotype to produce a plot of the genotype-centred environment data, with the x and y-axes representing two of the genotypes. The line y=x is also plotted. Genotypes (or environments) below this line perform better in the environment (or genotype) representing the x-axis, and genotypes (or environments) above this line perform better in the environment (or genotype) representing the y-axis. The two environments (or genotypes) are again specified by setting LEV1 and LEV2 to their levels or labels.

When there are a large number of genotypes it may be helpful to cull some of them from the biplot. For example, you may want to remove genotypes that have performed badly in some of the environments. To do this you specify CULL to a variate or a text containing the levels or labels of the environments that you want to consider. Then, by default, all genotypes with y-values less then the median value at each chosen environment will be removed. Alternatively, you can specify some other quantile at which to cull by using the QUANTILE option. Note, however, if you select more than one environment when the y-values at the environments are negatively correlated, there may be very few (or possibly no) genotypes left to plot.

The GENREVERSE and ENVREVERSE options can reverse the y-direction in the plots of the genotype and environment scores, respectively,

By default, the species scores, site scores and x-variable(s) are labelled by the labels of the ENVIRONMENTS and GENOTYPES factors, if available, or otherwise by their levels. Alternatively, you can specify other labels using the LABENVIRONMENTS and LABGENOTYPES parameters.

Options: PRINT, DIMENSIONS, PLOT, METHOD, SCPLOT, SCALING, NORMALIZE, CULL, QUANTILE, DIVISIONS, RANKINGLINES, GENREVERSE, ENVREVERSE, WINDOW, KEYWINDOW.

Parameters: DATA, GENOTYPES, ENVIRONMENTS, LEV1, LEV2, LABGENOTYPES, LABENVIRONMENTS, TITLE, MEGAGROUPS.

Method

GGEBIPLOT calculates a principal components analysis on the data variates, which automatically column-centres the data thus removing the environmental effects. The eigenvectors for genotype i and/or the eigenvectors for environment j are multiplied by a constant to get environment and genotype scores. The constant is chosen by setting the SCALING option as follows:

    genotype λi × ith environmental eigenvector
    environment λi × ith genotype eigenvector
    symmetric genotype scores scaled by √λi × ith environmental eigenvector, environment scores scaled by √λi × ith genotype eigenvector

where {λi} are the singular values of the data, with the values of i set by DIMENSIONS.

The singular values are equivalent to multiplying the roots from a principal components analysis by (n-1) and then raising to the power of -½. The eigenvectors for the genotypes are obtained by multiplying the scores from a principal components analysis by a diagonal matrix containing the singular values. The enviromental eigenvectors are calculated by multiplying the data by the inverse of (the genotype eigenvectors multiplied by the singular values).

The genotype-focused scaling is used to display the interrelationships of the genotypes. The environment-focused scaling is probably used most frequently. It displays the interrelationship among environments, and has the following properties.

(1)  The cosine of the angle between any two environments approximates their correlation.

(2)  The lengths of the environment vectors are approximately proportional to their standard deviations.

(3)  The inner product between two environments approximates their covariance.

The symmetric scaling method allows for comparisons of the relative variances between the genotypes and environments.

References

Yan, W. & Kang, M.S. (2003). GGE Biplot Analysis: a Graphical Tool for Breeders, Geneticists and Agronomists. CRC Press, Boca Raton.

Hunt, L.A. & Yan, W. (2002). Biplot analysis of diallel data. Crop Science, 42, 21-30.

See also

Procedures: AMMI, GESTABILITY, RFINLAYWILKINSON, DBIPLOT, CABIPLOT, CRBIPLOT, CRTRIPLOT.

Commands for: REML analysis of linear mixed models, Graphics.

Example

CAPTION 'GGEBIPLOT example',!t('Data from Hunt & Yan (2002)',\
        'Fusarium head Table 3. Tolerance to infection by',\
        'pink stem borer (PSB) of 10 blight of seven winter',\
        'wheat genotypes and their FAC1 hybrids'); STYLE=meta,plain
VARIATE [NVALUES = 49; VALUES =\ 
27.5, 35.7, 46.4, 53.7, 33.3, 64.9, 43.3,\
35.7, 37.5, 46.2, 40.8, 51.9, 45.6, 57.5,\
46.4, 46.2, 38.7, 49.1, 50.4, 55.6, 69.4,\
53.7, 40.8, 49.1, 51.2, 49.4, 48.1, 57.5,\
33.3, 51.9, 50.4, 49.4, 42.5, 63.1, 68.9,\
64.9, 45.6, 55.6, 48.1, 63.1, 60.0, 63.1,\
43.3, 57.5, 69.4, 57.5, 68.9, 63.1, 43.7] Yield
FACTOR   [NVALUES=49; LABELS=!T(a,b,c,d,e,f,g)] Env
FACTOR   [NVALUES=49; LABELS=!T(A,B,C,D,E,F,G)] Genotype
GENERATE Env,Genotype
GGEBIPLOT Yield; GEN=Genotype; ENVIRONMENT=Env
Updated on March 7, 2019

Was this article helpful?