Performs empirical-distribution-function goodness-of-fit tests (V.M. Cave).
|Controls printed output (
||What graphs to plot (
||Specifies the type of goodness-of-fit test to perform (
||Continuous distribution that is hypothesized to have generated the
||Whether to estimate a constant for the distribution, when the parameter values are estimated from the
||Specifies the method used to perform the goodness-of-fit tests (
||Parameter values for the hypothesized distribution; if this is not set, parameter values are estimated from the
||Names to identify the parameters in
||Expression, formed using argument
||Whether the parameters are re-estimated or fixed during the Monte-Carlo simulations, when the parameter values are estimated from the
||Number of Monte-Carlo simulations to perform; default 999|
||Seed for random number generation; default 0 continues an existing sequence or, if none, selects a seed automatically|
||Title for the graphs; default generates the title automatically|
||Y-axis title for the graphs; default generates the title automatically|
||X-axis title for the graphs; default generates the title automatically|
||Window to use for the graphs; default 3|
||Whether to clear the screen before plotting the graph or to continue plotting on the old screen, when a single graph is requested (
||Identifier of the variate holding the data|
||Pointer to scalar(s) to save the test statistic(s)|
||Pointer to variates(s) to save the Monte-Carlo simulated test statistic(s)|
||Pointer to scalar(s) to save the probability value(s) of the test statistic(s)|
EDFTEST performs one-sample two-sided empirical-distribution-function goodness-of-fit tests to assess whether a sample of data comes from a specified continuous distribution. The data values must be supplied, in a variate, using the
DATA parameter. The type of tests to be performed are specified by the
TEST option, with settings
cramervonmises (Cramér-von Mises) and
The method used to perform these tests is specified by the
TMETHOD option, with settings
likelihoodratio for the Zhang (2002) likelihood-ratio based method, and
traditional for the traditional approach. The default is to use the likelihood-ratio based tests, which are generally more powerful.
The distribution from which the data are assumed to arise is specified using the
DISTRIBUTION option; default
normal. Values for the parameters can be supplied, in either a scalar or a variate, by the
PARAMETERS option. However, when parameter values are supplied, a value must be specified for every parameter.
If parameter values are not supplied, they are estimated from the
DATA, except when
DISTRIBUTION is set to
NAMES option specifies a text to identify the individual parameter values within a variate of
PARAMETERS. The parameter names associated with each distribution are given below. When the names are not supplied, the default ordering of the parameters is assumed. (This matches the ordering in which parameter estimates are saved using the
ESTIMATES parameter of the
DPROBABILITY procedure.) The parameter names are listed below, in the default parameter ordering for each distribution:
| Beta Type I (
| Beta Type II (
||ashape, bshape, rate;|
| Burr (
||ashape, scale, bshape;|
| Cauchy (
| Chi-square (
| Extreme Value Type I (
| Extreme Value Type II (
||location, scale, shape;|
| Extreme Value Type III (
||location, scale, shape;|
| Exponential (
|Exponential modified normal (
||mean, sd, rate (default) or emean;|
| F (
| Gamma (
||shape, rate, constant (optional);|
| Generalized Extreme Value (
||shape, location, scale;|
| Generalized Pareto (
| Inverse Burr (
||ashape, scale, bshape;|
| Inverse Gamma (
| Inverse Normal (
| Inverse Weibull (
| Laplace (
| Log-Gamma (
| Logistic (
| Log-Logistic (
| Log-Normal (
||mean, sd, constant (optional);|
| Normal (
| Paralogistic (
| Pareto (
||shape, scale, constant (optional);|
|Skew Normal (
||mean, sd, skewness parameter alpha;|
| t (
| Uniform-Beta mixture (
||weight, ashape, bshape;|
| Uniform-Gamma mixture (
||weight, shape, scale;|
| Uniform (
| Weibull (
||shape, rate, constant (optional);|
The Gamma, Log-Normal, Pareto and Weibull distributions can have an extra constant parameter, so that the data values minus the constant then follow the specified distribution. When
PARAMETERS are not supplied, you can set option
estimate to estimate a constant from the
DATA. The default is not to estimate a constant.
The Exponentially modified Normal can have two parameterizations, with the third parameter as either emean (mean of the exponential distribution) or the exponential rate (reciprocal of the mean).
DPROBABILITY estimates and returns the exponential rate, but in some case it is easier to provide the mean. The third unit of
NAMES indicates whether rate or emean has been provided; if is not set a rate parameter is assumed.
DISTRIBUTION option provides the common distributions. Alternatively, for traditional tests (i.e.
traditional) you can set
DISTRIBUTION=calculated to define your own distribution. You must then use the
CDFCALCULATION option to provide an expression, formed using argument
X, to calculate the cumulative distribution function. For example, the
exponential distribution with rate parameter of 2 could be specified by setting options
Monte-Carlo simulations are used to calculate the empirical probability values of the test statistics under the likelihood-ratio based method (i.e.
likelihoodratio), or, by default, under the traditional method when the parameters are estimated from the
NTIMES option defines how many Monte-Carlo simulations are used; default 999. The
SEED option can be set to initialize the random-number generator used during the Monte-Carlo simulations; if the procedure is called again with the same settings, you will get identical results. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, the seed is initialized automatically.
By default, when parameters are estimated from the
DATA during the Monte-Carlo simulations, the parameters are re-estimated to ensure that the correct probability values are obtained. However, this can be overridden by setting the
MCPARAMETERS option to
Printed output is controlled by the
||to print summary information; and|
||to print the test statistic(s), with its probability value(s) under the assumption that the data are from the hypothesized distribution (so a low probability indicates that the data are unlikely to be from the hypothesized distribution).|
The printed output can be suppressed by setting option
PLOT option controls graphical output, with settings:
||to plot a histogram of the Monte-Carlo simulated test statistics; and|
||to produce a kernel density plot of the Monte-Carlo simulated test statistics.|
By default, nothing is plotted.
XTITLE options can supply an overall title, a y-axis title and a x-axis title for the graphs, respectively. If these are not supplied, suitable titles are generated automatically. When a single plot is requested, you can set option
keep to plot the graph on an existing screen; by default the screen is cleared first. The
WINDOW option defines the window to use for the plots; default 3.
MCSTATISTICS parameters allow the test statistics, their probabilities and the Monte-Carlo simulated test statistics to be saved, respectively, in pointers.
EDFTEST calculates the traditional Anderson-Darling, Cramér-von Mises and Kolmogorov-Smirnov goodness-of-fit tests. When
PARAMETERS are supplied (or if
fix), the probability of the Anderson-Darling test statistic is calculated using the fast algorithm (adinf) of Marsaglia & Marsaglia (2004), the probability of the Cramér-von Mises test statistic is calculated using the one-term linking approximation (equation 1.8) of Csörgő & Faraway (1996), and the probability of the Kolmogorov-Smirnov test statistic is calculated using the method of Carvalho (2015) for data sets with fewer than 171 values or using the Wang et al. (2003) approximation for larger data sets. When
PARAMETERS are not supplied, Monte-Carlo simulation is used by default to obtain empirical probability values of the test statistics. However, empirical probability values are not available for
EDFTEST calculates likelihood-ratio based goodness-of-fit test statistics using the method of Zhang (2002). (Note, however, that the likelihood-ratio based method is not available for
calculated.) The resulting tests are generally more powerful than their traditional analogues. Monte-Carlo simulation is used to obtain empirical probability values of the test statistics.
PARAMETERS are not supplied, maximum-likelihood estimates are obtained using the methods in the
DPROBABILITY procedure. When
estimate, the parameter values are re-estimated for each simulated data set using the
The kernel-density plot is generated by the
KERNELDENSITY procedure, using the method of Sheather & Jones (1991), with the default number of grid points. The simulated test statistics are plotted using red
+ symbols along the x-axis, and the location of the test statistic is denoted by a blue line. As the observed test statistic contributes to the null distribution, it is included in the calculation of both the kernel density and histogram.
DATA variate can be restricted to assess a subset of the data.
Carvalho, L. (2015). An improved evaluation of Kolmogorov’s distribution. Journal of Statistical Software, 65(3), 1-7.
Csörgő, S. & Faraway, J.J. (1996). The exact and asymptotic distributions of Cramér-von Mises statistics. Journal of the Royal Statistical Society, Series B, 58, 221-234.
Marsaglia, G. & Marsaglia, J. (2004). Evaluating the Anderson-Darling distribution. Journal of Statistical Software, 9(2), 1-5.
Sheather, S.J. & Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683-690.
Wang, J., Tsang, W.W. & Marsaglia, G. (2003). Evaluating of Kolmogorov’s distribution. Journal of Statistical Software, 8(18), 1-4.
Zhang (2002). Powerful goodness-of-fit tests based on the likelihood ratio. Journal of the Royal Statistical Society, Series B, 64, 281-294.
Commands for: Basic and nonparametric statistics.
CAPTION 'EDFTEST example',\ !t('Random sample of size 10 assumed to come from the Uniform distribution.'),\ !t('From W.J. Conover (1980), Practical Nonparametric Statistics 2ed, pg 348.');\ STYLE=meta,plain,plain VARIATE [VALUES=0.621,0.503,0.203,0.477,0.710,0.581,0.329,0.480,0.554,0.382] x "Assuming a Uniform[0,1] distribution." "Likelihood-ratio based tests with histograms of the Monte-Carlo test statistics." EDFTEST [PLOT=histogram; DISTRIBUTION=uniform; PARAMETERS=!(1,0); NAMES=!t(max,min);\ SEED=1234; NTIMES=999] x "Traditional tests." EDFTEST [TMETHOD=traditional; DISTRIBUTION=uniform; PARAMETERS=!(1,0);\ NAMES=!t(max,min)] x "Estimating parameter values from the data." "Likelihood-ratio based tests with kernel density plots of the Monte-Carlo test statistics." EDFTEST [PLOT=kerneldensity; DISTRIBUTION=uniform; SEED=1234; NTIMES=999] x "Traditional tests with kernel density plots of the Monte-Carlo test statistics." EDFTEST [TMETHOD=traditional; PLOT=kerneldensity; DISTRIBUTION=uniform; SEED=1234;\ NTIMES=999] x