Estimates the parameter lambda of a single parameter transformation (D.M. Smith).
Options
TRANSFORM = string token |
Type of transformation (power , modulus , foldedpower , GuerreroJohnson , Aranda1 , Aranda2 , powerlogit ); default powe |
---|---|
METHOD = string tokens |
Method of evaluating transformation parameter lambda (Atkinson , Andrews , BoxCox , Robust ); default boxc |
K = scalar |
Cut-off value for robust method; default * |
LOWER = scalar |
Lower limit of range of lambda; default * |
UPPER = scalar |
Upper limit of range of lambda; default * |
STEPLENGTH = scalar |
Increment of lambda; default (UPPER – LOWER )/20 |
LAMBDA = scalar |
Single value of lambda; default * |
FVBOUND = string token |
Replace illegal fitted values by the corresponding boundary values (no , yes ); default no |
GRAPHICS = string token |
What sort of graphics to use (lineprinter , highresolution ); default high |
TERMS = formula |
Terms of model |
Parameters
Y = variates |
Response variate |
---|---|
NBINOMIAL = variates |
Denominator for a binomial variate |
SAVE = pointers |
Structures to save the output |
Description
This procedure is for evaluating the “best” value of the transformation parameter (lambda) for a range of single parameter transformations. It offers four methods of evaluation and seven families of transformations. If a range of values of lambda is input (using the LOWER
and UPPER
options), plots are produced of either an F statistic or a log likelihood on the Y
axis against lambda on the X
axis. For the Atkinson and Andrews methods it is an F statistic, whereas for the Box-Cox and robust methods it is a log likelihood. The interval (of lambda) at which the plotted functions are evaluated can be controlled by the STEPLENGTH
option. A list of methods is allowed and the plots have been arranged so that they are all produced on the same screen in order to make comparison easy. By default these are in high-resolution. Setting option GRAPHICS=lineprinter
generates line-printer style (character) plots (one per page), and setting GRAPHICS=*
suppresses the plots altogether. If a single value of lambda is input (using the LAMBDA
option) no graphical display is produced.
The Y
parameter must be set to specify the response variate i.e. the variate being considered for transformation. For a binomial distribution the NBINOMIAL
parameter must also be set. The terms in the fitted model are specified by the TERMS
option, which may be set to a formula or left unset to fit a model involving only a constant term. For reasons of scale invariance, as described in Schlesselman (1971), a constant term must be included in the model. The TRANSFORM
option specifies which family of transformations is desired. It can take one of seven values. The setting power
represents the power transformation family (Box & Cox 1964); modulus
represents the modulus transformation family (John & Draper 1980); foldedpower
the folded-power transformation family (Atkinson 1985); guerrerojohnson
the Guerrero-Johnson (1982) transformation family; aranda1
and aranda2
the two Aranda-Ordaz (1981) transformation families; and powerlogit
the power-logit (otherwise known as skewed logit) transformation family (Stukel 1988). The METHOD
option details which methods of evaluating the transformation parameter (lambda) are required. It can be a list of from one to four values. Four methods of evaluation are incorporated. These are the added variable method of Atkinson (1982), the added variable method of Andrews (1971), the maximum likelihood method of Box & Cox (1964), and a robust method due to Carroll (1980). For this latter method a scalar K
is required. This value is the standard normal deviate value (z) at which the distribution changes from a standard normal to an exponential.
One problem with transforming data and then fitting models is that the fitted values (of the transformed data) can go out of the legal range. If the data are binomial, proportions of zero or one are replaced inside the procedure by 0.5/NBINOMIAL
and 1 – 0.5/NBINOMIAL
respectively. Conversely, when proportions are input directly in the Y
variate, units with values less than or equal to zero or greater than or equal to one are ignored in the calculations. Option FVBOUND
controls what happens in other circumstances when a fitted value goes outside the allowed range of the transformation. By default, no action is taken but, if FVBOUND=yes
, illegal fitted values are replaced by the corresponding limiting values of the transformation.
The values of the F statistics or log likelihoods can be saved, with the associated values of lambda, using the SAVE
parameter. This returns a pointer containing four elements. The first three of these are texts specifying, respectively, the transformation family (SAVE[1]
, one value), the value of FVBOUND
(SAVE[2]
, one value) and the methods used (SAVE[3]
, one to four values). The fourth element (SAVE[4]
) is a matrix of results with dimensions (number of values of lambda evaluated × number of methods plus one). Column 1 of this matrix contains the evaluated values of lambda, column 2 has the values (F statistics or log likelihoods) for the first method requested, and so on for the other methods. If the option LAMBDA
is used, this matrix has only one row.
Full details of the methodology implemented are given by Smith (2002).
Options: TRANSFORM
, METHOD
, K
, LOWER
, UPPER
, STEPLENGTH
, LAMBDA
, FVBOUND
, GRAPHICS
, TERMS
.
Parameters: Y
, NBINOMIAL
, SAVE
.
Method
Much of the methodology implemented is based on that described and reviewed in Atkinson (1985), and Cook & Weisberg (1982). The four methods of evaluation are the added variable method of Atkinson (1982), the added variable method of Andrews (1971), the maximum likelihood method of Box & Cox (1964), and a robust method (based on maximum likelihood) due to Carroll (1980). The seven transformations are the power transformation of Box & Cox (1964), the modulus transformation of John & Draper (1980), the folded-power transformation (as expounded in Atkinson 1985), the Guerrero-Johnson (1982) transformation, the two transformations of Aranda-Ordaz (1981), and the power-logit (otherwise known as skewed logit) transformation of Stukel (1988). The log-likelihood produced for the Box & Cox method differs from that given by Box & Cox (1964), as they omit the constant term N/2. YTRANSFORM
includes this for compatibility with Carroll’s robust method, which collapses to Box & Cox’s method as K becomes infinite.
Action with RESTRICT
If the Y
variate is restricted, the analysis will use only the units not excluded by the restriction.
References
Andrews, D.F. (1971). A note on the selection of data transformations. Biometrika, 58, 249-54.
Aranda-Ordaz, F.J. (1981). On two families of transformation to additivity for binary response data. Biometrika, 68, 357-63.
Atkinson, A.C. (1982). Regression diagnostics, transformations and constructed variables (with discussion). Journal of the Royal Statistical Society, Series B, 44, 1-36.
Atkinson, A.C. (1985). Plots, Transformations and Regression. Oxford University Press, Oxford.
Box, G.E.P. & Cox, D.R. (1964), An analysis of transformations (with discussion). Journal of the Royal Statistical Society, Series B, 26, 211-46.
Carroll, R.J. (1980). A robust method for testing transformations to achieve approximate normality. Journal of the Royal Statistical Society, Series B, 42, 71-78.
Cook, R.D. & Weisberg, S. (1982). Residuals and Influence in Regression. Chapman & Hall, New York.
Guerrero, V.M. & Johnson, R.A. (1982). Use of Box-Cox transformation with binary response models. Biometrika, 69, 309-14.
John, J.A. & Draper, N.R. (1980). An alternative family of transformations. Applied Statistics, 29, 190-97.
Schlesselman, J. (1971). Power families: a note on the Box and Cox transformation. Journal of the Royal Statistical Society, Series B, 33, 307-311.
Smith, D.M. (2002). Computing single parameter transformations. Communications in Statistics – Simulation and Computation, 32, 605-618.
Stukel, T.A. (1988). Generalized logistic models. Journal of the American Statistical Association, 83, 426-31.
See also
Directive: CALCULATE
.
Procedure: ABOXCOX
.
Commands for: Calculations and manipulation, Regression analysis.
Example
CAPTION 'YTRANSFORM example',!t('Data from Box & Cox',\ '(1964), J.R. Statist. Soc. B, 26, 211-46. Y is survival',\ 'time (unit 10 hours) of animals, D is poison dose and T is',\ 'treatment. Note, the results for METHOD=BoxCox differ',\ 'from those in the paper by the constant value 24 (= n/2).');\ STYLE=meta,plain FACTOR [LEVELS=3; VALUES=16(1...3)] D & [LEVELS=4; VALUES=(1...4)12] T VARIATE [VALUES=0.31,0.82,0.43,0.45,0.45,1.10,0.45,0.71,0.46,0.88,\ 0.63,0.66,0.43,0.72,0.76,0.62,0.36,0.92,0.44,0.56,\ 0.29,0.61,0.35,1.02,0.40,0.49,0.31,0.71,0.23,1.24,\ 0.40,0.38,0.22,0.30,0.23,0.30,0.21,0.37,0.25,0.36,\ 0.18,0.38,0.24,0.31,0.23,0.29,0.22,0.33] Y YTRANSFORM [TERMS=T+D; METHOD=Atkinson,Andrews; LAMBDA=1.0]\ Y; SAVE=!P(Transform,Restriction,Methods,Results) FOR [NTIMES=1] PRINT [IPRINT=*] 'Transformation:',Transform; JUSTIFICATION=left & [IPRINT=*] 'Restriction:',Restriction; JUSTIFICATION=left & [IPRINT=*; SERIAL=yes; ORIENTATION=across]\ Methods; FIELDWIDTH=13 PRINT [IPRINT=*; RLPRINT=*; CLPRINT=*; SQUASH=yes] Results; FIELDWIDTH=13 ENDFOR YTRANSFORM [TERMS=T+D; METHOD=BoxCox,robust; K=2; LOWER=-1.5; UPPER=0.5]\ Y; SAVE=Save FOR [NTIMES=1] PRINT [IPRINT=*; STYLE=plain] 'Transformation:',Save[1]; JUSTIFICATION=le & 'Restriction:',Save[2]; JUSTIFICATION=left & [SERIAL=yes; STYLE=form; ORIENTATION=across]\ Save[3]; FIELDWIDTH=13; SKIP=0 PRINT [IPRINT=*; RLPRINT=*; CLPRINT=*; SQUASH=yes] Save[4]; FIELDWIDTH=13 ENDFOR