Performs empirical-distribution-function goodness-of-fit tests (V.M. Cave).

### Options

`PRINT` = string tokens |
Controls printed output (`summary` , `tests` ); default `summ` , `test` |
---|---|

`PLOT` = string tokens |
What graphs to plot (`kerneldensity` , `histogram` ); default * |

`TEST` = string tokens |
Specifies the type of goodness-of-fit test to perform (`andersondarling` , `cramervonmises` , `kolmogorovsmirnov` ); default `ande` , `cram` , `kolm` |

`DISTRIBUTION` = string tokens |
Continuous distribution that is hypothesized to have generated the `DATA` ; (`beta` , `b2` , `burr` , `cauchy` , `chisquare` , `ev1` (or `gumbel` ), `ev2` (or `frechet` ), `ev3` , `expnormal` , `exponential` , `fdistribution` , `gamma` , `gev` , `gpareto` , `iburr` , `igamma` , `invnormal` , `iweibull` , `laplace` , `loggamma` , `logistic` , `loglogistic` , `lognormal` , `normal` , `paralogistic` , `pareto` , `skewnormal` , `stdnormal` , `stduniform` , `tdistribution` , `ubetamix` , `ugammamix` , `uniform` , `weibull` , `calculated` ); default `norm` |

`CONSTANT` = string tokens |
Whether to estimate a constant for the distribution, when the parameter values are estimated from the `DATA` (`estimate` , `omit` ); default `omit` |

`TMETHOD` = string tokens |
Specifies the method used to perform the goodness-of-fit tests (`likelihoodratio` , `traditional` ); default `like` |

`PARAMETERS` = scalar or variate |
Parameter values for the hypothesized distribution; if this is not set, parameter values are estimated from the `DATA` |

`NAMES` = text |
Names to identify the parameters in `PARAMETERS` ; if this is not set, the default parameter ordering is assumed |

`CDFCALCULATION` = expression |
Expression, formed using argument `X` , that defines the cumulative distribution function of the hypothesized distribution; must be specified when `DISTRIBUTION` `=` `calculated` |

`MCPARAMETERS` = string tokens |
Whether the parameters are re-estimated or fixed during the Monte-Carlo simulations, when the parameter values are estimated from the `DATA` (`fix` , `estimate` ); default `esti` |

`NTIMES` = scalar |
Number of Monte-Carlo simulations to perform; default 999 |

`SEED` = scalar |
Seed for random number generation; default 0 continues an existing sequence or, if none, selects a seed automatically |

`TITLE` = text |
Title for the graphs; default generates the title automatically |

`YTITLE` = text |
Y-axis title for the graphs; default generates the title automatically |

`XTITLE` = text |
X-axis title for the graphs; default generates the title automatically |

`WINDOW` = scalar |
Window to use for the graphs; default 3 |

`SCREEN` = string tokens |
Whether to clear the screen before plotting the graph or to continue plotting on the old screen, when a single graph is requested (`clear` , `keep` ); default `clear` |

### Parameters

`DATA` = variate |
Identifier of the variate holding the data |
---|---|

`STATISTIC` = pointer |
Pointer to scalar(s) to save the test statistic(s) |

`MCSTATISTICS` = pointer |
Pointer to variates(s) to save the Monte-Carlo simulated test statistic(s) |

`PROBABILITY ` = pointer |
Pointer to scalar(s) to save the probability value(s) of the test statistic(s) |

### Description

`EDFTEST`

performs one-sample two-sided empirical-distribution-function goodness-of-fit tests to assess whether a sample of data comes from a specified continuous distribution. The data values must be supplied, in a variate, using the `DATA`

parameter. The type of tests to be performed are specified by the `TEST`

option, with settings `andersondarling`

(Anderson-Darling), `cramervonmises`

(Cramér-von Mises) and `kolmogorovsmirnov`

(Kolmogorov-Smirnov).

The method used to perform these tests is specified by the `TMETHOD`

option, with settings `likelihoodratio`

for the Zhang (2002) likelihood-ratio based method, and `traditional`

for the traditional approach. The default is to use the likelihood-ratio based tests, which are generally more powerful.

The distribution from which the data are assumed to arise is specified using the `DISTRIBUTION`

option; default `normal`

. Values for the parameters can be supplied, in either a scalar or a variate, by the `PARAMETERS`

option. However, when parameter values are supplied, a value must be specified for every parameter.

If parameter values are not supplied, they are estimated from the `DATA`

, except when `DISTRIBUTION`

is set to `stdnormal`

, `stduniform`

or `calculated`

.

The `NAMES`

option specifies a text to identify the individual parameter values within a variate of `PARAMETERS`

. The parameter names associated with each distribution are given below. When the names are not supplied, the default ordering of the parameters is assumed. (This matches the ordering in which parameter estimates are saved using the `ESTIMATES`

parameter of the `DPROBABILITY`

procedure.) The parameter names are listed below, in the default parameter ordering for each distribution:

Beta Type I (`beta` ) |
ashape, bshape; |
---|---|

Beta Type II (`b2` ) |
ashape, bshape, rate; |

Burr (`burr` ) |
ashape, scale, bshape; |

Cauchy (`cauchy` ) |
location, scale; |

Chi-square (`chisquare` ) |
df; |

Extreme Value Type I (`ev1` or `gumbel` ) |
location, scale; |

Extreme Value Type II (`ev2` or `frechet` ) |
location, scale, shape; |

Extreme Value Type III (`ev3` ) |
location, scale, shape; |

Exponential (`exponential` ) |
rate; |

Exponential modified normal (`expnormal` ) |
mean, sd, rate (default) or emean; |

F (`fdistribution` ) |
ndf, ddf; |

Gamma (`gamma` ) |
shape, rate, constant (optional); |

Generalized Extreme Value (`gev` ) |
shape, location, scale; |

Generalized Pareto (`gpareto` ) |
shape, scale; |

Inverse Burr (`iburr` ) |
ashape, scale, bshape; |

Inverse Gamma (`igamma` ) |
shape, scale; |

Inverse Normal (`invnormal` ) |
mean, shape; |

Inverse Weibull (`iweibull` ) |
scale, shape; |

Laplace (`laplace` ) |
location, scale; |

Log-Gamma (`loggamma` ) |
shape, rate; |

Logistic (`logistic` ) |
location, scale; |

Log-Logistic (`loglogistic` ) |
shape, scale; |

Log-Normal (`lognormal` ) |
mean, sd, constant (optional); |

Normal (`normal` ) |
mean, sd; |

Paralogistic (`paralogistic` ) |
shape, scale; |

Pareto (`pareto` ) |
shape, scale, constant (optional); |

Skew Normal (`skewnormal` ) |
mean, sd, skewness parameter alpha; |

t (`tdistribution` ) |
df; |

Uniform-Beta mixture (`ubetamix` ) |
weight, ashape, bshape; |

Uniform-Gamma mixture (`ugammamix` ) |
weight, shape, scale; |

Uniform (`uniform` ) |
min, max; |

Weibull (`weibull` ) |
shape, rate, constant (optional); |

The Gamma, Log-Normal, Pareto and Weibull distributions can have an extra constant parameter, so that the data values minus the constant then follow the specified distribution. When `PARAMETERS`

are not supplied, you can set option `CONSTANT`

= `estimate`

to estimate a constant from the `DATA`

. The default is not to estimate a constant.

The Exponentially modified Normal can have two parameterizations, with the third parameter as either *emean* (mean of the exponential distribution) or the exponential rate (reciprocal of the mean). `DPROBABILITY`

estimates and returns the exponential rate, but in some case it is easier to provide the mean. The third unit of `NAMES`

indicates whether rate or *emean* has been provided; if is not set a rate parameter is assumed.

The `DISTRIBUTION`

option provides the common distributions. Alternatively, for traditional tests (i.e. `TMETHOD`

`=`

`traditional`

) you can set `DISTRIBUTION=calculated`

to define your own distribution. You must then use the `CDFCALCULATION`

option to provide an expression, formed using argument `X`

, to calculate the cumulative distribution function. For example, the `exponential`

distribution with rate parameter of 2 could be specified by setting options

`DISTRIBUTION=calculated`

and

`CDF=!E(X=1-EXP(-2*X))].`

Monte-Carlo simulations are used to calculate the empirical probability values of the test statistics under the likelihood-ratio based method (i.e. `TMETHOD`

= `likelihoodratio`

), or, by default, under the traditional method when the parameters are estimated from the `DATA`

. The `NTIMES`

option defines how many Monte-Carlo simulations are used; default 999. The `SEED`

option can be set to initialize the random-number generator used during the Monte-Carlo simulations; if the procedure is called again with the same settings, you will get identical results. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, the seed is initialized automatically.

By default, when parameters are estimated from the `DATA`

during the Monte-Carlo simulations, the parameters are re-estimated to ensure that the correct probability values are obtained. However, this can be overridden by setting the `MCPARAMETERS`

option to `fix`

.

Printed output is controlled by the `PRINT`

option, with settings:

`summary` |
to print summary information; and |
---|---|

`tests` |
to print the test statistic(s), with its probability value(s) under the assumption that the data are from the hypothesized distribution (so a low probability indicates that the data are unlikely to be from the hypothesized distribution). |

The printed output can be suppressed by setting option `PRINT`

= *. The default is to print the summary and the tests.

The `PLOT`

option controls graphical output, with settings:

`histogram` |
to plot a histogram of the Monte-Carlo simulated test statistics; and |
---|---|

`kerneldensity` |
to produce a kernel density plot of the Monte-Carlo simulated test statistics. |

By default, nothing is plotted.

The `TITLE`

, `YTITLE`

and `XTITLE`

options can supply an overall title, a y-axis title and a x-axis title for the graphs, respectively. If these are not supplied, suitable titles are generated automatically. When a single plot is requested, you can set option `SCREEN`

= `keep`

to plot the graph on an existing screen; by default the screen is cleared first. The `WINDOW`

option defines the window to use for the plots; default 3.

The `STATISTIC`

, `PROBABILITY`

and `MCSTATISTICS`

parameters allow the test statistics, their probabilities and the Monte-Carlo simulated test statistics to be saved, respectively, in pointers.

Options: `PRINT`

, `PLOT`

, `DISTRIBUTION`

, `CONSTANT`

, `TMETHOD`

, `PARAMETERS`

, `NAMES`

, `CDFCALCULATION`

, `MCPARAMETERS`

, `NTIMES`

, `SEED`

, `TITLE`

, `YTITLE`

, `XTITLE`

, `WINDOW`

, `SCREEN`

.

Parameters: `DATA`

, `STATISTIC`

, `MCSTATISTICS`

, `PROBABILITY`

.

### Method

If `TMETHOD=traditional`

, `EDFTEST`

calculates the traditional Anderson-Darling, Cramér-von Mises and Kolmogorov-Smirnov goodness-of-fit tests. When `PARAMETERS`

are supplied (or if `MCPARAMETERS`

= `fix`

), the probability of the Anderson-Darling test statistic is calculated using the fast algorithm (adinf) of Marsaglia & Marsaglia (2004), the probability of the Cramér-von Mises test statistic is calculated using the one-term linking approximation (equation 1.8) of Csörgő & Faraway (1996), and the probability of the Kolmogorov-Smirnov test statistic is calculated using the method of Carvalho (2015) for data sets with fewer than 171 values or using the Wang *et al. *(2003) approximation for larger data sets. When `PARAMETERS`

are not supplied, Monte-Carlo simulation is used by default to obtain empirical probability values of the test statistics. However, empirical probability values are not available for `DISTRIBUTION`

`=`

`ubetamix`

or `ugammamix`

.

If `TMETHOD`

`=`

`likelihoodratio`

, `EDFTEST`

calculates likelihood-ratio based goodness-of-fit test statistics using the method of Zhang (2002). (Note, however, that the likelihood-ratio based method is not available for` DISTRIBUTION `

= `ubetamix`

, `ugammamix`

, or `calculated`

.) The resulting tests are generally more powerful than their traditional analogues. Monte-Carlo simulation is used to obtain empirical probability values of the test statistics.

When `PARAMETERS`

are not supplied, maximum-likelihood estimates are obtained using the methods in the `DPROBABILITY`

procedure. When `MCPARAMETERS`

`=`

`estimate`

, the parameter values are re-estimated for each simulated data set using the `DPROBABILITY`

procedure.

The kernel-density plot is generated by the `KERNELDENSITY`

procedure, using the method of Sheather & Jones (1991), with the default number of grid points. The simulated test statistics are plotted using red `+`

symbols along the x-axis, and the location of the test statistic is denoted by a blue line. As the observed test statistic contributes to the null distribution, it is included in the calculation of both the kernel density and histogram.

### Action with `RESTRICT`

The `DATA`

variate can be restricted to assess a subset of the data.

### References

Carvalho, L. (2015). An improved evaluation of Kolmogorov’s distribution. *Journal of Statistical Software*, 65(3), 1-7.

Csörgő, S. & Faraway, J.J. (1996). The exact and asymptotic distributions of Cramér-von Mises statistics. *Journal of the Royal Statistical Society, Series B*, 58, 221-234.

Marsaglia, G. & Marsaglia, J. (2004). Evaluating the Anderson-Darling distribution. *Journal of Statistical Software*, 9(2), 1-5.

Sheather, S.J. & Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. *Journal of the Royal Statistical Society, Series B*, 53, 683-690.

Wang, J., Tsang, W.W. & Marsaglia, G. (2003). Evaluating of Kolmogorov’s distribution. *Journal of Statistical Software*, 8(18), 1-4.

Zhang (2002). Powerful goodness-of-fit tests based on the likelihood ratio. *Journal of the Royal Statistical Society, Series B*, 64, 281-294.

### See also

Directive: `DISTRIBUTION`

.

Procedures: `DPROBABILITY`

, `NORMTEST`

, `KOLMOG2`

, `WSTATISTIC`

.

Commands for: Basic and nonparametric statistics.

### Example

CAPTION 'EDFTEST example',\ !t('Random sample of size 10 assumed to come from the Uniform distribution.'),\ !t('From W.J. Conover (1980), Practical Nonparametric Statistics 2ed, pg 348.');\ STYLE=meta,plain,plain VARIATE [VALUES=0.621,0.503,0.203,0.477,0.710,0.581,0.329,0.480,0.554,0.382] x "Assuming a Uniform[0,1] distribution." "Likelihood-ratio based tests with histograms of the Monte-Carlo test statistics." EDFTEST [PLOT=histogram; DISTRIBUTION=uniform; PARAMETERS=!(1,0); NAMES=!t(max,min);\ SEED=1234; NTIMES=999] x "Traditional tests." EDFTEST [TMETHOD=traditional; DISTRIBUTION=uniform; PARAMETERS=!(1,0);\ NAMES=!t(max,min)] x "Estimating parameter values from the data." "Likelihood-ratio based tests with kernel density plots of the Monte-Carlo test statistics." EDFTEST [PLOT=kerneldensity; DISTRIBUTION=uniform; SEED=1234; NTIMES=999] x "Traditional tests with kernel density plots of the Monte-Carlo test statistics." EDFTEST [TMETHOD=traditional; PLOT=kerneldensity; DISTRIBUTION=uniform; SEED=1234;\ NTIMES=999] x