1. Home
  2. YTRANSFORM procedure

YTRANSFORM procedure

Estimates the parameter lambda of a single parameter transformation (D.M. Smith).

Options

TRANSFORM = string token Type of transformation (power, modulus, foldedpower, GuerreroJohnson, Aranda1, Aranda2, powerlogit); default powe
METHOD = string tokens Method of evaluating transformation parameter lambda (Atkinson, Andrews, BoxCox, Robust); default boxc
K = scalar Cut-off value for robust method; default *
LOWER = scalar Lower limit of range of lambda; default *
UPPER = scalar Upper limit of range of lambda; default *
STEPLENGTH = scalar Increment of lambda; default (UPPERLOWER)/20
LAMBDA = scalar Single value of lambda; default *
FVBOUND = string token Replace illegal fitted values by the corresponding boundary values (no, yes); default no
GRAPHICS = string token What sort of graphics to use (lineprinter, highresolution); default high
TERMS = formula Terms of model

Parameters

Y = variates Response variate
NBINOMIAL = variates Denominator for a binomial variate
SAVE = pointers Structures to save the output

Description

This procedure is for evaluating the “best” value of the transformation parameter (lambda) for a range of single parameter transformations. It offers four methods of evaluation and seven families of transformations. If a range of values of lambda is input (using the LOWER and UPPER options), plots are produced of either an F statistic or a log likelihood on the Y axis against lambda on the X axis. For the Atkinson and Andrews methods it is an F statistic, whereas for the Box-Cox and robust methods it is a log likelihood. The interval (of lambda) at which the plotted functions are evaluated can be controlled by the STEPLENGTH option. A list of methods is allowed and the plots have been arranged so that they are all produced on the same screen in order to make comparison easy. By default these are in high-resolution. Setting option GRAPHICS=lineprinter generates line-printer style (character) plots (one per page), and setting GRAPHICS=* suppresses the plots altogether. If a single value of lambda is input (using the LAMBDA option) no graphical display is produced.

The Y parameter must be set to specify the response variate i.e. the variate being considered for transformation. For a binomial distribution the NBINOMIAL parameter must also be set. The terms in the fitted model are specified by the TERMS option, which may be set to a formula or left unset to fit a model involving only a constant term. For reasons of scale invariance, as described in Schlesselman (1971), a constant term must be included in the model. The TRANSFORM option specifies which family of transformations is desired. It can take one of seven values. The setting power represents the power transformation family (Box & Cox 1964); modulus represents the modulus transformation family (John & Draper 1980); foldedpower the folded-power transformation family (Atkinson 1985); guerrerojohnson the Guerrero-Johnson (1982) transformation family; aranda1 and aranda2 the two Aranda-Ordaz (1981) transformation families; and powerlogit the power-logit (otherwise known as skewed logit) transformation family (Stukel 1988). The METHOD option details which methods of evaluating the transformation parameter (lambda) are required. It can be a list of from one to four values. Four methods of evaluation are incorporated. These are the added variable method of Atkinson (1982), the added variable method of Andrews (1971), the maximum likelihood method of Box & Cox (1964), and a robust method due to Carroll (1980). For this latter method a scalar K is required. This value is the standard normal deviate value (z) at which the distribution changes from a standard normal to an exponential.

One problem with transforming data and then fitting models is that the fitted values (of the transformed data) can go out of the legal range. If the data are binomial, proportions of zero or one are replaced inside the procedure by 0.5/NBINOMIAL and 1 – 0.5/NBINOMIAL respectively. Conversely, when proportions are input directly in the Y variate, units with values less than or equal to zero or greater than or equal to one are ignored in the calculations. Option FVBOUND controls what happens in other circumstances when a fitted value goes outside the allowed range of the transformation. By default, no action is taken but, if FVBOUND=yes, illegal fitted values are replaced by the corresponding limiting values of the transformation.

The values of the F statistics or log likelihoods can be saved, with the associated values of lambda, using the SAVE parameter. This returns a pointer containing four elements. The first three of these are texts specifying, respectively, the transformation family (SAVE[1], one value), the value of FVBOUND (SAVE[2], one value) and the methods used (SAVE[3], one to four values). The fourth element (SAVE[4]) is a matrix of results with dimensions (number of values of lambda evaluated × number of methods plus one). Column 1 of this matrix contains the evaluated values of lambda, column 2 has the values (F statistics or log likelihoods) for the first method requested, and so on for the other methods. If the option LAMBDA is used, this matrix has only one row.

Full details of the methodology implemented are given by Smith (2002).

Options: TRANSFORM, METHOD, K, LOWER, UPPER, STEPLENGTH, LAMBDA, FVBOUND, GRAPHICS, TERMS.

Parameters: Y, NBINOMIAL, SAVE.

Method

Much of the methodology implemented is based on that described and reviewed in Atkinson (1985), and Cook & Weisberg (1982). The four methods of evaluation are the added variable method of Atkinson (1982), the added variable method of Andrews (1971), the maximum likelihood method of Box & Cox (1964), and a robust method (based on maximum likelihood) due to Carroll (1980). The seven transformations are the power transformation of Box & Cox (1964), the modulus transformation of John & Draper (1980), the folded-power transformation (as expounded in Atkinson 1985), the Guerrero-Johnson (1982) transformation, the two transformations of Aranda-Ordaz (1981), and the power-logit (otherwise known as skewed logit) transformation of Stukel (1988). The log-likelihood produced for the Box & Cox method differs from that given by Box & Cox (1964), as they omit the constant term N/2. YTRANSFORM includes this for compatibility with Carroll’s robust method, which collapses to Box & Cox’s method as K becomes infinite.

Action with RESTRICT

If the Y variate is restricted, the analysis will use only the units not excluded by the restriction.

References

Andrews, D.F. (1971). A note on the selection of data transformations. Biometrika, 58, 249-54.

Aranda-Ordaz, F.J. (1981). On two families of transformation to additivity for binary response data. Biometrika, 68, 357-63.

Atkinson, A.C. (1982). Regression diagnostics, transformations and constructed variables (with discussion). Journal of the Royal Statistical Society, Series B, 44, 1-36.

Atkinson, A.C. (1985). Plots, Transformations and Regression. Oxford University Press, Oxford.

Box, G.E.P. & Cox, D.R. (1964), An analysis of transformations (with discussion). Journal of the Royal Statistical Society, Series B, 26, 211-46.

Carroll, R.J. (1980). A robust method for testing transformations to achieve approximate normality. Journal of the Royal Statistical Society, Series B, 42, 71-78.

Cook, R.D. & Weisberg, S. (1982). Residuals and Influence in Regression. Chapman & Hall, New York.

Guerrero, V.M. & Johnson, R.A. (1982). Use of Box-Cox transformation with binary response models. Biometrika, 69, 309-14.

John, J.A. & Draper, N.R. (1980). An alternative family of transformations. Applied Statistics, 29, 190-97.

Schlesselman, J. (1971). Power families: a note on the Box and Cox transformation. Journal of the Royal Statistical Society, Series B, 33, 307-311.

Smith, D.M. (2002). Computing single parameter transformations. Communications in Statistics – Simulation and Computation, 32, 605-618.

Stukel, T.A. (1988). Generalized logistic models. Journal of the American Statistical Association, 83, 426-31.

See also

Directive: CALCULATE.

Procedure: ABOXCOX.

Commands for: Calculations and manipulation, Regression analysis.

Example

CAPTION    'YTRANSFORM example',!t('Data from Box & Cox',\
           '(1964), J.R. Statist. Soc. B, 26, 211-46. Y is survival',\
           'time (unit 10 hours) of animals, D is poison dose and T is',\
           'treatment. Note, the results for METHOD=BoxCox differ',\
           'from those in the paper by the constant value 24 (= n/2).');\ 
           STYLE=meta,plain
FACTOR     [LEVELS=3; VALUES=16(1...3)] D
&          [LEVELS=4; VALUES=(1...4)12] T
VARIATE    [VALUES=0.31,0.82,0.43,0.45,0.45,1.10,0.45,0.71,0.46,0.88,\
                   0.63,0.66,0.43,0.72,0.76,0.62,0.36,0.92,0.44,0.56,\
                   0.29,0.61,0.35,1.02,0.40,0.49,0.31,0.71,0.23,1.24,\
                   0.40,0.38,0.22,0.30,0.23,0.30,0.21,0.37,0.25,0.36,\
                   0.18,0.38,0.24,0.31,0.23,0.29,0.22,0.33] Y
YTRANSFORM [TERMS=T+D; METHOD=Atkinson,Andrews; LAMBDA=1.0]\ 
           Y; SAVE=!P(Transform,Restriction,Methods,Results)
FOR [NTIMES=1]
  PRINT    [IPRINT=*] 'Transformation:',Transform; JUSTIFICATION=left
  &        [IPRINT=*] 'Restriction:',Restriction; JUSTIFICATION=left
  &        [IPRINT=*; SERIAL=yes; ORIENTATION=across]\ 
           Methods; FIELDWIDTH=13
  PRINT    [IPRINT=*; RLPRINT=*; CLPRINT=*; SQUASH=yes] Results; FIELDWIDTH=13
ENDFOR

YTRANSFORM [TERMS=T+D; METHOD=BoxCox,robust; K=2; LOWER=-1.5; UPPER=0.5]\ 
           Y; SAVE=Save
FOR [NTIMES=1]
  PRINT    [IPRINT=*; STYLE=plain] 'Transformation:',Save[1]; JUSTIFICATION=le
  &        'Restriction:',Save[2]; JUSTIFICATION=left
  &        [SERIAL=yes; STYLE=form; ORIENTATION=across]\ 
           Save[3]; FIELDWIDTH=13; SKIP=0
  PRINT    [IPRINT=*; RLPRINT=*; CLPRINT=*; SQUASH=yes] Save[4]; FIELDWIDTH=13
ENDFOR
Updated on March 4, 2019

Was this article helpful?