GEV Procedure

Fits a Generalized Extreme Value distribution to a variate. (D.B. Baird)

The Generalized Extreme Value distribution asymptotically models block maxima from any distribution with a stable maximum value distribution.

The modelling of maxima is used to answered questions about how often in the future values larger than a certain value may occur. By modelling the distribution, if the assumptions are true, then a smaller data series may be extrapolated to an event occurring more rarely than the length of the series. For example, if you want to estimate a 1 in 100 year event, but only have 40 years of data, you must extrapolate. This approach also provides some smoothing of the distribution so that the information is taken from all values, and not just the most extreme few events that have occurred in a short series. The value which is exceeded 1 in n years is known as the return level, and 1/n is the return probability.

The maxima are assumed to come from the maximum of a fixed number of observations, which are identically and independently distributed, and that the process generating the observations is stationary. Departures from these assumptions, for example that there is a trend or correlations over the series of observations or that the maxima are of different sized blocks can affect the validity of the analysis.

A maximum stable distribution must take one of three forms, the Gumbel (Extreme Value Type 1), Frechet (Extreme Value Type 2) or Weibull (Extreme Value Type 3) distributions. The main difference between the distributions is the behaviour of the upper tail. For the Weibull distribution, the maximum tail value is finite, whereas it is infinite for the other two distributions. The upper tail of the Gumbel distribution decays exponentially, whereas it decays polynomially for the Frechet. These three distributions can be combined into a single distribution, the Generalized Extreme Value distribution that has a shape parameter, eta, which specifies which of the three families the particular GEV distribution belongs to. If eta = 0, the distribution is Gumbel, if eta > 0, it is Frechet, and if eta < 0, Weibull. If plotted on a log-log scale, the Gumbel distribution will give a straight line, the Frechet will curve up, and the Weibull will tend to a horizontal asymptote.

The GEV distribution has three parameters, a location parameter, mu, a scale parameter, sigma, and the shape parameter, eta. The GEV cumulative distribution function (CDF) is given as:

G(x) = exp{-[1 + eta*(x - mu)/sigma]**(-1/eta)},  for {x: 1 + eta*(x - mu)/sigma > 0}.

For eta = 0, the CDF above is undefined, but in the limit as eta tends to 0 this gives the usual Gumbel CDF of:

G(x) = exp{ -exp[(x - mu)/sigma]},  for any x.

Minima may also be modelled by changing the sign of the variate.

When GROUPS and/or TREND are set, the estimated return probabilities and levels are estimated for the first level of the GROUP factor, and at the mean of the TREND variate. Also, any graphs are displayed using these standardized values. You can standardize the estimated values to a different value of TREND by setting the STANDARD option.

Options

PRINT = strings	What to print (model, estimates, tests, fittedvalues, monitoring, all); default model, estimates, tests
PLOT = strings	Which graphs to plot, (qq, pp, etaprofile, density, returns, rprofile, all); default qq, returns, density
ENVELOPE = string	Include confidence envelopes on plots (yes, no); default yes
CIMETHOD = string	How to calculate confidence limits (exact, quadratic); default quadratic
CDIRECTION = string	Direction of censoring (left, right), default right i.e. real value >= X. Left censoring is where the true value is less than the given value X
ALPHA = scalar	Significance Level on confidence bands; default – 0.95
ETA = scalar	Value of ETA to force a GEV with this value; default *, estimate ETA by Maximum Likelihood
STANDARD = scalar	Value of the TREND to standardize return levels and probabilities to; default * – use the mean value of TREND
TITLE = text	Text containing the title for each plot (each row is a title)
WINDOW = scalar	Which window to plot the graphs in; default – 3

Parameters

DATA = variate	Variate containing Data to fit the GEV distribution to
GROUPS = factor	Factor giving groups with different means
TREND = variate	Variate giving explanatory term to fit a linear regression for the location parameter
CENSOR = variate	Variate giving 1 for censored values, 0 otherwise
ESTIMATES = variate	Estimated values for MU,SIGMA and ETA and TREND and GROUPS parameters
SE = variate	Standard errors for MU,SIGMA and ETA and TREND and GROUPS parameters
RETURNS = variate	If SET for input, the values to estimate the return probabilities for, otherwise the estimated return periods for values in PROBABILITY
PROBABILITY = variate	If SET for input, the values to estimate the return levels for, otherwise the estimated probabilities for values in RETURNS
LOWER = variate	Lower limit of ALPHA confidence interval for either the estimated values of RETURNS or PROBABILITY depending which values have been input
UPPER = variate	Upper limit of ALPHA confidence interval for either the estimated values of RETURNS or PROBABILITY depending which values have been input
LOGLIKELIHOOD = loglikelihood	Log-likelihood of fitted distribution
EXIT = scalar	The exit code from FITNONLINEAR used to obtain the maximum likelihood estimates

Options: PRINT, PLOT, ENVELOPE, CIMETHOD, CDIRECTION, ALPHA, ETA, STANDARD, TITLE, WINDOW

Parameters: DATA, GROUPS, TREND, CENSOR, ESTIMATES, SE, RETURNS, PROBABILITY, LOWER, UPPER, LOGLIKELIHOOD, EXIT

Description

The GEV procedure fits the Generalized Extreme Value distribution by maximum likelihood.

The PRINT option controls what results are displayed from the analysis (model, estimates, tests, fittedvalues, monitoring). Setting PRINT=all will display all available results. The model setting gives the function for the CDF and details of the model being fitted, the estimates gives the estimated values for mu, sigma and eta, the tests setting gives a likelihood ratio test of eta being zero and a goodness of fit test for data following the GEV distribution, the fittedvalues setting a table of the observed return levels and the fitted values for each of these, and the monitoring setting gives a trace of the parameters and likelihood from any profile likelihoods being estimated (which can be quite slow).

The PLOT option controls what graphs are displayed from the analysis (qq, pp, etaprofile, density, returns, rprofile). Setting PLOT=all will generate all available graphs. The quantile-quantile (qq) and probability-probability (pp) graphs are produced using the DPROBABILITY directive. The etaprofile and rprofile settings give profile likelihood plots for eta and the return levels respectively. The density setting gives a combined histogram, dot plot and probability density function on the same graph. The returns setting gives a plot of the data and estimated return levels against the return period. The return period is the reciprocal of the return probability (an n year return period = a 1 in n year return probability). The TITLE option provides titles for the selected plots. Each line in the specified text structure provides a title for the plots. The WINDOW option specifies the FRAME to plot the graphs within.

The ENVELOPE option can be set to control whether confidence bands are include on the graphs. The ALPHA option controls the confidence limits used (default 95%) for the bands and for the estimated return levels, probabilities and profile likelihoods. The CIMETHOD option controls whether approximate or profilelikelihood confidence limits are estimated for the return levels. If profile likelihood limits are chosen, a profile likelihood curve for the return levels is calculated. Profile likelihood confidence limits can be very slow to calculate, but are more accurate. The PLOT=rprofile setting is only active if CIMETHOD=exact. Profile likelihood confidence limits are never available for estimated return probabilities.

The ETA option can be used to force a given value of eta to be used for the distribution. A common setting for this would be ETA=0 which forces the Gumbel distribution to be used.

Checking of stationarity for the values for either different groups or a trend with an associated variate can be done with the GROUPS and TREND parameters respectively. If the -2 times change in the log-likelihood is greater than a Chi-square deviate for the appropriate degrees of freedom (number of groups – 1, or 1 respectively, then the GROUPS or TREND parameters are significant. This calculation can be performed by saving the log-likelihoods for the respective fits with and without the term using the LOGLIKELIHOOD parameter, and then calculating the likelihood ratio test with the statement:

CALC ChiPr = CUCHISQUARE(-2*(loglik1 - loglik0);df)

The CENSOR parameter allows for data that have not been completely observed, but a value for which the true value must be less than (left censoring) or greater than (right censoring) has been observed. The CDIRECTION option specifies the direction of censoring (Left or Right). Left or right censored values are added to the likelihood using the CDF or 1-CDF respectively, rather using the normal probability density function. If censoring is used, then various of the plots produced are adjusted to allow for the censored values.

The EXIT parameter allows the exit code from FITNONLINEAR to be returned in a scalar so that you can check whether the estimation has converged. The estimated parameters and their standard errors can be saved in the structures set for ESTIMATES and SE respectively.

The RETURNS and PROBABILITIES parameters allow you to specify values that you want the corresponding return probabilities or levels estimated for, respectively. If the values in the variate for RETURNS are set, then the estimated return probabilities are given in PROBABILITIES and the lower and upper confidence limits for the return probabilities are in given in LOWER and UPPER respectively. If the values in the variate for PROBABILITIES are set, then the estimated return levels are given in RETURNS and the lower and upper confidence limits for the return levels are in given in LOWER and UPPER respectively.

Method

The log-likelihood function is set up using EXPRESSION statements and maximum likelihood estimates formed by using FITNONLINEAR. Firstly, an optimal value of eta is found by forming a profile likelihood for values of eta varying over a range of values, and then the full likelihood is maximised using all parameters, starting from the optimum found from the profile likelihood. To form exact confidence limits, the likelihood is reparameterised using the return level as a parameter, and a profile likelihood formed by varying the return level, and optimising over the remaining parameters.

Action with RESTRICT

Restricted units are left out of the analysis.

References

Coles, Stuart (2001). An introduction to statistical modelling of extreme values. Springer-Verlag: London.
Reiss R and Thomas M (2001). Statistical Analysis of Extreme Values, 2nd edition. Birkhauser: Basel.

Example

The file Windspeed.GSH contains annual maximum wind speeds (in mph) from 1950 to 1979. The storms generating these windows are classified into tropical and non-tropical storms.

Treating all these wind speeds as a single group gives the following analysis and graphs.

Genstat Code

SPLOAD '%GENDIR%\\EXAMPLES\\WindSpeed.GSH' 
"Fit a Generalized Extreme Value Distribution to Maxima" 
GEV [PRINT=model,estimates,tests; PLOT=qq,etaprofile,density,returns;\  
    CIMETHOD=quadratic; ENVELOPE=yes; ALPHA=0.95; WINDOW=3]\  
    DATA=mph; PROBABILITY=0.05 
" Check for trend over time in maximum wind speeds " 
GEV [PRINT=model,estimates; PLOT=*;ETA=0] DATA=mph; TREND=Year 
" Check for difference in storm type " 
GEV [PRINT=model,estimates; PLOT=*] DATA=mph; GROUPS=Type 
" Estimate Non-Tropical storm mean, allowing for censoring due
  to tropical storms " 
GEV [PRINT=model,estimates; PLOT=*;CDIRECTION=Left]\  
    DATA=mph; CENSOR=Censor; PROBABILITY=0.05

Output

 Generalized Extreme Value Distribution: 
   CDF(x) = EXP(-[1 + Eta*(x - Mu)/Sigma]**(-1/Eta)), Eta0 
          = EXP(-EXP(-(x - Mu)/Sigma),                Eta==0

 *** Estimates of GEV parameters *** 
              estimate    "s.e." 
 Mu              43.38     2.056 
 Sigma           6.962     1.556 
 Eta           0.09417    0.2181

 Maximum Log-Likelihood = -107.25

 Maximum value of GEV Distribution is Infinite (Eta >= 0)

 Significance Test that Eta = 0 (ie mph follows a Gumbel distribution)

   Likelihood Ratio test statistic: 0.411 
   Chi-Square Probability of test: 0.5215

 Goodness of Fit Test for mph following a GEV distribution 
 ---------------------------------------------------------

         Critical values of test statistics (MARGINAL tests) 
         --------------------------------------------------------- 
                                       Significance level 
                             ------------------------------------- 
          Test statistic      15%     10%      5%     2.5%     1% 
         --------------------------------------------------------- 
         Anderson-Darling    0.576   0.656   0.787   0.918   1.092 
         Cramer-von Mises    0.091   0.104   0.126   0.148   0.178 
         Watson              0.085   0.096   0.116   0.136   0.163 
         ---------------------------------------------------------

 --------------------------------------------------------------- 
                                         Test statistic 
                               --------------------------------- 
 Type of                       Anderson-     Cramer- 
   test        Variate(s)       Darling     von Mises     Watson 
 --------------------------------------------------------------- 
 Marginal             1          0.232        0.029        0.029 
 --------------------------------------------------------------- 
 ?, *, ** indicate significance at 10%, 5% and 1% levels respectively

 95.0 % Profile Likelihood Interval for Eta: ( -0.1739 0.4562 )

 95.0 % Approximate Intervals for Return Periods 
 --------------------------------------------------------- 
   Probability Return Period     Level     Lower     Upper 
       0.05000         20.00     67.24     50.08     84.40

From this analysis, there is no evidence of lack of fit to the GEV distribution (the Anderson-Darling, Cramer-von Mises and Watson tests statistics are all non significant), and a shape parameter of 0 (i.e. a Gumbel distribution) could be used (as the Likelihood ratio test for eta = 0 is non significant)

Graphs

	This graph plots the observed values against the expected values. There is a slight indication of some of the higher wind speeds (3 and 4 largest) lying above the expected line, but as these are all contained within the 95% confidence bands, the departure is minor.
	The value of eta lies within the confidence limits (-0.17, 0.46).
	The fitted density seems a reasonable fit to the histogram of wind speeds.
	This plot can be used to read off return levels of the y-axis for any given return period (1 in n years, n given along the x-axis).

The data can be examined to see if there is any trend over years, and whether there is a difference between the two types of storms: tropical and non- tropical. However, in this case we don’t have every reading each year of the two types of storm, thus the analysis between groups is not the correct one (although we can perform it anyway). We can,however, estimate the distribution for the non-tropical storms using the fact that when we have a tropical storm with the maximum wind speed, we know that the maximum non-tropical storm that year must have been less than this value. Thus the tropical storms values can be taken as left censored values. Using this censoring approach can then give us a prediction of a 1 in 20 year tropical storm.

 Gumbel Extreme Value Distribution (GEV with Eta = 0): 
   CDF(x) = EXP(-EXP(-(x - Mu)/Sigma)

 Fitting Trend term: Year

 *** Estimates of Gumbel parameters *** 
              estimate    "s.e." 
 Mu(Intercept)     16.71         * 
 Sigma             7.305         * 
 Eta                   0     FIXED 
 Slope(Year)     0.01377         *

 Maximum Log-Likelihood = -107.45

  Gumbel Extreme Value Distribution (GEV with Eta = 0): 
   CDF(x) = EXP(-EXP(-(x - Mu)/Sigma)

 Fitting Groups term: Type

 *** Estimates of Gumbel parameters *** 
              estimate    "s.e." 
 Mu(Tropical)     43.39     3.677 
 Sigma            7.221     1.499 
 Eta                  0     FIXED 
 Non-tropical    0.4697     4.218

 Maximum Log-Likelihood = -107.44

 Gumbel Extreme Value Distribution (GEV with Eta = 0): 
   CDF(x) = EXP(-EXP(-(x - Mu)/Sigma)

 8 data points are right censored

 *** Estimates of Gumbel parameters *** 
              estimate    "s.e." 
 Mu              45.92     2.447 
 Sigma           8.555     2.039 
 Eta                 0     FIXED

 Maximum Log-Likelihood = -85.65

95.0 % Approximate Intervals for Return Periods 
 --------------------------------------------------------- 
   Probability Return Period     Level     Lower     Upper 
       0.05000         20.00     71.33     56.91     85.74

It can be seen that there is no significant effect of Year or Type (comparing the -2 times the change in log-likelihood with a Chi-square distribution with 1 degree of freedom (95% limit = 3.84)). The estimated 1 in 20 year return level for a non-tropical storm is 71.3 compared with 67.2 for all storms.