Fits models for Wadley’s problem, allowing alternative links and errors (D.M. Smith).
Options
PRINT = string tokens |
Controls printed output (deviance , estimates , correlations , monitoring ); default devi , esti |
---|---|
DISTRIBUTION = string token |
Distribution of the response variate (poisson , negativebinomial , qlnegativebinomial , qlscaledpoisson ); default pois |
LINK = string token |
Link transformation (logit , probit , complementaryloglog , cauchit ); default logi |
TERMS = formula |
Model to be fitted |
CONTROL = factor |
Factor to distinguish the control, or zero, dose (level 1) from the other treatments (level 2) |
MAXIMAL = factor |
Factor to define the maximal model i.e. with a level for every combination of values of the variates and factors in TERMS |
RMETHOD = string token |
Type of residuals to be formed (deviance , Pearson ); default devi |
Parameters
Y = variates |
Response variate for each fit |
---|---|
RESIDUALS = variates |
Variate to save the residuals from each fit |
FITTEDVALUES = variates |
Variate to save the fitted values from each fit |
Description
WADLEY
uses the generalized linear models methodology of composite link functions to fit a range of models for the situation known as Wadley’s problem. This arises in bioassay where it is possible to count only the number of subjects that have not responded to a particular dose of a drug or stimulus. For example, with eggs of insects fumigated in grain, it is generally possible to count only those that survive and hatch.
By default, the analysis assumes that the numbers of subjects that are treated in each observation follow a Poisson distribution with a common mean parameter; other distributions can be specified using the DISTRIBUTION
option or, for user-defined distributions, by providing subsidiary procedure WADDISTRIBUTION
(see details of the procedures called by WADLEY
).
The analysis estimates the mean of the distribution, and then fits the dose response curve as in an ordinary probit analysis. The LINK
option defines the transformation (logit, probit, cauchit, or complementary log-log) required to make the model additive. User-defined transformations can also be specified, by leaving LINK
unset and providing subsidiary procedure WADLINK
to calculate the necessary fitted values and derivatives, and WADINITIAL
to calculate initial values for the linear predictor (see details of the procedures called by WADLEY
). The model to be fitted is defined by the TERMS
option.
To assist the estimation of the expected total number of subjects, there must be some control observations – for example with zero doses of fumigant. These must be identified by a factor, specified by the CONTROL
option, with level 1 for untreated and level 2 for treated. The comparison between the treated and untreated levels of CONTROL
must not be aliased with any of the variates and factors in TERMS
. (Thus if, for example, TERMS
contained a factor representing different types of drug, this must not have a separate level for the untreated observations.)
Often with these sort of data, it is found that the variability exceeds that which would be expected from the distribution assumed for the data. To estimate the amount of overdispersion, the MAXIMAL
option must be set to a factor with a different level for every combination of values of the factors and variates in the TERMS
model.
Options: PRINT
, DISTRIBUTION
, LINK
, TERMS
, CONTROL
, MAXIMAL
, RMETHOD
.
Parameters: Y
, RESIDUALS
, FITTEDVALUES
.
Method
In essence WADLEY
is a specific application of the use of composite link functions in generalized linear models. The actual methods used are those in the Genstat procedure GLM
(Lane 1989) and the GLIM
macros of Smith & Morgan (1989). The procedure is very similar in spirit to these GLIM
macros, and it is recommended that this reference be consulted for further information. However, there are some extensions. The capability to handle user-defined links and distributions has been added. Also, the range of distributions has been extended to include two forms of quasi-likelihood, namely that where the weighting is of negative binomial form (weight=1/(1+hf×fittedvalues)), and that where the weighting is of scaled Poisson form (weight=1/hf), where hf is the heterogeneity factor. If the estimated heterogeneity factor is less than zero in the negative binomial cases, or if it is less than one in the scaled Poisson case, it is set to zero or one respectively.
WADLEY
has two subsidiary procedures, WADCODI
and WADFIT
, to assist with the analysis; neither of these need be modified by the user:
WADCODI
prints the results of the iterative processes;
WADFIT
performs the iterative model fits.
There are also three other procedures, which can be rewritten or replaced, to cater for further user-defined distributions and links:
WADDISTRIBUTION
calculates the variance function and deviance for a user-defined distribution;
WADINITIAL
calculates initial estimates of the linear predictor for a user-defined link;
WADLINK
calculates the fitted values and derivatives for a user-defined link.
If the DISTRIBUTION
option is unset, the procedure will call WADDISTRIBUTION
instead of using one of the various standard distributions. For a Poisson error distribution WADDISTRIBUTION
should be defined like this.
PROCEDURE 'WADDISTRIBUTION'
"Calculation of variance function and deviance"
PARAMETER 'Y', "Input: variate; response variate"\
'FITTED', "Input: variate; fitted values"\
'VARIANCE',"Output: variate; variance"\
'LL', "Output: variate; log likelihood variate"\
'DEVIANCE';"Output: scalar; total deviance"\
MODE=p
SCALAR two; VALUE=2
CALCULATE VARIANCE = FITTED
& LL = Y*LOG(Y/FITTED)-Y+FITTED
& DEVIANCE = two*SUM(LL)
ENDPROC
For other error distributions only the three CALCULATE
statements need to be changed.
Similarly, for option LINK
unset, WADINITIAL
and WADLINK
will be called. For a logit link WADINITIAL
would be defined as follows.
PROCEDURE 'WADINITIAL'
"Calculation of initial estimates of linear predictor"
PARAMETER 'Y', "Input: variate; response variate"\
'LP', "Output: variate; linear predictor"\
'IND', "Input: variate; marker variate with value 1
for a control observation, 0 otherwise"\
'MAXY'; "Inout: scalar; estimate of asymptote"\
MODE=p
SCALAR half,one; VALUE=0.5,1
CALCULATE LP = IND*LOG(MAXY/(Y+half)-one)
ENDPROC
For other links only the CALCULATE
statement need be changed so, for example, a probit link would require the statement
CALCULATE LP = IND*NED(one-(Y+one)/MAXY)
For a logit link WADLINK
would be
PROCEDURE 'WADLINK'
"Calculation of fitted values and derivatives
of the link function given the linear predictor"
PARAMETER 'LP', "Input: variate; linear predictor"\
'IND', "Input: variate; marker variate with value 1
for a control observation, 0 otherwise"\
'TA', "Output: variate; estimate of fitted values"\
'TB', "Output: variate; estimate of derivatives"\
'MAXY'; "Input: scalar; estimate of asymptote"\
MODE=p
SCALAR half,one; VALUE=0.5,1
CALCULATE TA = (.NOT.IND)+IND/(one+EXP(LP))
& TB = MAXY*EXP(LP)*TA*TA
ENDPROC
For other links only the CALCULATE
statements need to be changed so, for example, a probit link would require
CALCULATE TA = (.NOT.IND)+IND/(one-NORMAL(LP))
& TB = MAXY*EXP(-half*LP*LP)/ROOT2PI
where ROOT2PI
is a scalar with the value of the square root of 2π. The marker variate IND
identifies which is the control and non control data, so TA
should always be of the form
TA = (.NOT.IND)+IND*function
where function
is the link function for the non-control part of the data. The variate TB
should always be of the form
TB = MAXY*deriv_fn
where deriv_fn
is the derivative of the link function with respect to the linear predictor (LP
).
If LINK
or DISTRIBUTION
are unset, but no user routines are given for WADINITIAL
, WADLINK
and WADDISTRIBUTION
, then those given here (for logit link and Poisson error distribution) will be used.
A debt is owned to Dr J. Parrott of Pfizer Central Research, Sandwich, UK for his support and encouragement of this work.
Action with RESTRICT
If the Y-variate is restricted, only the specified subset of the units will be included in the analysis.
References
Lane, P.W. (1989). Procedure GLM. In: Genstat Procedure Library Release 1.3[2] (ed. R.W.Payne & G.M.Arnold), 80-82.
Smith, D.M. & Morgan, B.J.T. (1989). Extended models for Wadley’s Problem. Glim Newsletter, 18, 21-28.
See also
Procedure: PROBITANALYSIS
.
Commands for: Regression analysis.
Example
CAPTION 'WADLEY example',\ 'Data from Smith & Morgan, GLIM Newsletter, 18, 1989.';\ STYLE=meta,plain VARIATE [NVALUES=70] Dose,Count READ Dose,Count 0 219 0 228 0 202 0 237 0 228 0 204 0 217 0 190 0 224 0 218 1 167 1 158 1 158 1 175 1 167 5 105 5 123 5 105 5 105 5 105 10 88 10 88 10 61 10 61 10 88 50 61 50 44 50 35 50 35 50 44 1 166 1 158 1 181 1 143 1 159 5 97 5 112 5 88 5 120 5 103 10 78 10 80 10 75 10 74 10 102 50 49 50 40 50 57 50 51 50 40 1 160 1 143 1 148 1 135 1 142 5 101 5 81 5 82 5 94 5 74 10 54 10 42 10 52 10 48 10 63 50 32 50 15 50 16 50 19 50 23 : FACTOR [LEVELS=2] Control & [LEVELS=3; VAL=30(1),20(2,3)] Group " Note: the Control observations must be assigned to one of the three levels of Group as otherwise the model is overparameterised; here the ten control observations have been assigned to level 1." CALCULATE Control= 1+(Dose>0) & LDose = LOG(Dose + (Dose==0)) CAPTION !t('Fitting parallel linear regressions in log dose:',\ 'logit link and Poisson error.') WADLEY [DISTRIBUTION=poisson; LINK=logit; TERMS=Group+LDose;\ CONTROL=Control] Count CAPTION 'Allow for heterogeneity: quasi-likelihood, scaled Poisson error.' FACTOR [LEVELS=13] Full; VALUES=!(10(1),5(2...13)) WADLEY [DISTRIBUTION=qlscaledpoisson; TERMS=Group+LDose; CONTROL=Control;\ MAXIMAL=Full] Count