1. Home
  2. TFIT directive

TFIT directive

Estimates parameters in Box-Jenkins models for time series.

Options

PRINT = string tokens What to print (model, summary, estimates, correlations, monitoring); default mode,summ,esti
LIKELIHOOD = string token Method of likelihood calculation (exact, leastsquares, marginal); default exac
CONSTANT = string token How to treat the constant (estimate, fix); default esti
RECYCLE = string token Whether to continue from previous estimation (yes, no); default no
WEIGHTS = variate Weights; default *
MVREPLACE = string token Whether to replace missing values by their estimates (yes, no); default no
FIX = variate Defines constraints on parameters (ordered as in each model, tf models first): zeros fix parameters, parameters with equal numbers are constrained to be equal; default *
METHOD = string token Whether to carry out full iterative estimation, to carry out just one iterative step, to perform no steps but still give parameter standard deviations, or only to initialize for forecasting by regenerating residuals (full, onestep, zerostep, initialize); default full
MAXCYCLE = scalar Maximum number of iterations; default 15
TOLERANCE = scalar Criterion for convergence; default 0.0004
SAVE = identifier To name save structure, or supply save structure with transfer-functions; default * i.e. transfer-functions taken from the latest model

Parameters

SERIES = variate Time series to be modelled (output series)
TSM = TSM Model for output series
BOXCOXMETHOD = string token How to treat transformation parameter in output series (fix, estimate); default fix
RESIDUALS = variate To save residual series

Description

The main use of TFIT is to fit parameters to time-series models, although you can also use it to initialize for the TFORECAST directive, even when the model parameters are already known. TFIT was originally called ESTIMATE, but was renamed in Release 14 to emphasize its status as a time-series command. The earlier name (ESTIMATE) was retained to allow previous programs to continue to run, but this may be removed in a future release.

You need to define a TSM structure before using TFIT, to provide the setting for the TSM parameter. You may also wish to give a TRANSFERFUNCTION statement, for example if you wish to specify explanatory variables for regression with ARIMA errors, or to define transfer-function models. In many applications of estimating a univariate ARIMA model, you will need only a simple form of the directive, such as:

TFIT Daylength; TSM=Erp

The SERIES parameter specifies the variate holding the time series data to which the model is to be fitted.

The TSM parameter specifies the ARIMA model that is to be fitted to the time-series data. This TSM must already have been declared and its ORDERS must have been set. If the LAGS parameter of the TSM has been set, the lags must have been given values. However, if the PARAMETERS of the TSM model have been set, these need not have been declared previously nor given values. When the parameter values are not set, default values are used: these are all zero, except for the transformation parameter, which is set to 1.0 if it is not to be estimated (see BOXCOXMETHOD and FIX below). Any parameter values that you do specify will be used as initial values for the parameters in the model; Genstat replaces any missing values by the default values. If any group of autoregressive or moving-average parameters do not satisfy the required conditions for stationarity or invertibility, all the parameters to be estimated are reset by Genstat to the default values. After TFIT, the parameters of the TSM contain the estimated parameter values.

The BOXCOXMETHOD parameter allows you to estimate the transformation parameter λ.

The RESIDUALS parameter saves the estimated innovations (or residuals). The residuals are calculated for t=t0N, where t0=1+p+dq for a simple ARIMA model. If t0>1, missing values will be inserted for t=1…t0-1.

The PRINT option controls printed output. If you specify monitoring, then at each cycle of the iterative process of estimation, Genstat prints the deviance for the current fitted model, together with the current estimates of model parameters. The format is simple with the minimum of description, to let you judge easily how quickly the process is converging. The other settings of PRINT control output at the end of the iterative process. If you specify model, the model is briefly described, giving the identifier of the series and the time-series model, together with the orders of the model. If you specify summary, the deviance of the final model is printed, along with the residual number of degrees of freedom. If you specify estimates, the estimates of the model parameter are printed in a descriptive format, together with their estimated standard errors and reference numbers. If you specify correlations, the correlations between estimates of parameters are printed, with reference numbers to identify the parameters.

The LIKELIHOOD option specifies the criterion that Genstat minimizes to obtain the estimates of the parameters: this is described in the next section. The default setting exact is recommended for most applications.

You can use the CONSTANT option to specify whether Genstat is to estimate the constant term c in the model. If CONSTANT=fix, the constant is held at the value given in the initial parameter values; this need not be zero.

The RECYCLE option allows a previous TFIT statement to continue; this can save computing time. If RECYCLE=yes, the most recent TFIT statement is continued, unless the SAVE option has been set to the save structure from some other TFIT statement. The SERIES and TSM settings are then taken from this previous TFIT statement: Genstat ignores any specified in the current statement. Most of the settings of other parameters and options are carried over from the previous statement, and new values are ignored. However, there are some exceptions. You can change the RESIDUALS variate, you can reset MAXCYCLE to the number of further iterations you require, and you can change the settings of TOLERANCE and PRINT. You can also change the values of the variate in the WEIGHTS option; you can thus get reweighted estimation. You can change the values of the SERIES itself, although you cannot change missing values; if the MVREPLACE option was previously set to yes, you must put the original missing values back into the SERIES variate before the new TFIT statement.

The WEIGHTS option includes in the likelihood a weighted sum-of-squares term

t = t0 … N { wt at2 }

where wt, t=1…N are provided by the WEIGHTS variate. The values of wt must be strictly positive. If t0<1, where t0=1+d+pq, then wt is taken as 1 for t<1.

The MVREPLACE option allows you to request any missing values in the time-series to be replaced by their estimates after estimation. Genstat will always estimate the missing values, irrespective of the setting of MVREPLACE; so you can also obtain these estimates later from TKEEP.

The FIX option allows you to place simple constraints on parameter values throughout the estimation. The units of the FIX variate correspond to the parameters of the TSM, excluding the innovation variance. The values of the FIX variate are used to define the parameter constraints and must be integers. If an element of the FIX variate is set to 0, the corresponding parameter is constrained to remain at its initial setting. If an element is not 0, and the value is unique in the FIX variate, the parameter is estimated without any special constraint. If two or more values are equal, the corresponding parameters are constrained to be equal throughout the estimation. The number that you give to a parameter by FIX will appear as the reference number of the parameter in the printed model and correlation matrix. This option overrides any setting of CONSTANT and BOXCOXMETHOD.

The MAXCYCLE option specifies the maximum number of iterations to be performed.

The TOLERANCE option specifies the convergence criterion. Genstat decides that convergence has occurred if the fractional reduction in the deviance in successive iterations is less than the specified value, provided also that the search is not encountering numerical difficulties that force the step length in the parameter space to be severely limited. You can use monitoring to judge whether, for all practical purposes, the iterations have converged. Genstat gives warnings if the specified number of iterations is completed without convergence, or if the search procedure fails to find a reduced value of the deviance despite a very short step length. Such an outcome may be due to complexities in the likelihood function that make the search difficult, but can be due to your specifying too small a value for TOLERANCE.

The SAVE option allows you to save the time-series save structure produced by TFIT. You can use this in further TFIT statements with RECYCLE=yes, or in TFORECAST statements. It can also be used by the TDISPLAY and TKEEP directives. Genstat automatically saves the structure from the most recent TFIT statement, but this is over-written when the next TFIT statement is executed, unless you have used SAVE to give it an identifier of its own. You can access the current time-series save structure by the SPECIAL option of the GET directive, and reset it by the TSAVE option of the SET directive.

The METHOD option has four possible settings. The default setting is full which gives the usual estimation to convergence or until the maximum number of iterations has been reached.

With the setting METHOD=initialize, TFIT carries out only the residual regeneration steps (that is, calculation of at for t=t0N) which are needed before TFORECAST can be used. If the model has just been estimated using the default full setting, this is unnecessary. The setting initialize is useful when the time series is supplied with a known model and a minimal amount of calculation is wanted to prepare or initialize for forecasting. None of the model parameters are changed, and no standard errors of parameter estimates are available. Missing values in the series are estimated so this setting provides an efficient way of getting their values when the time series model is known; they can then be obtained using TKEEP. The deviance value is also available from TKEEP. This setting is therefore useful for efficient calculation of deviance values when you want to plot the shape of the deviance as a function of parameter values.

With the setting METHOD=zerostep the effect is the same as for initialize except that TFIT also calculates the standard errors of the parameters as if they had just been estimated. These can be used together with other quantities available from TKEEP to construct confidence intervals and carry out tests on the parameter values, which remain unchanged except that the innovation variance in the ARIMA model is replaced by its estimate conditional on all other parameters.

The setting METHOD=onestep gives the same results as specifying the option MAXCYCLE=1 in TFIT. It is convenient for carrying out quick tests of model parameters.

To explain the LIKELIHOOD option, we need to describe the estimation of ARIMA models in more detail. You may want to skip this if you are doing fairly routine work.

The first step in deriving the likelihood for a simple model is to calculate

wt = ∇dytc ,t = 1+dN

This has a multivariate Normal distribution with dispersion matrix Vσa2, where V depends only on the autoregressive and moving-average parameters. The likelihood is then proportional to

{ σa2mV│ } exp{ –wV-1w/2σa2 }

where m=Nd. In practice Genstat evaluates this by using the formula

wV-1 w = W + ∑t = t0 … N { at2 } = S

where t0=1+d+pq. The term W is a quadratic form in the p values w1+dqwp+dq: it takes account of the starting-value problem for regenerating the innovations at, and avoids losing information as would happen if the process used only a conditional sum-of-squares function. If q>0, Genstat introduces unobserved values of w1+dqwd in order to calculate the sum S. Genstat uses linear least-squares to calculate these q starting values for w, thus minimizing S. We shall call them back-forecasts, though if p>0 they are actually computationally convenient linear functions of the proper back-forecasts. We shall call S the sum-of-squares function: it is the sum of the quadratic form and the sum-of-squares term, and is identical to the value expressed by Box & Jenkins (1970) as

t = -∞ … N { at2 }

using infinite back-forecasting; that is, using:

W = ∑t = -∞ … t0-1 { at2 }

The values at for t=t0N agree precisely with those of Box and Jenkins.

To clarify all this, consider examples with no differencing; that is, d=0. If p=0 and q=1 then W=0 and t0=0, and one back-forecast w0 is introduced. If p=1 and q=0 then W=(1-φ12)w12 and t0=2, and no back-forecasts are needed. If p=q=1 then W=(1-φ12)w02 and t0=1, and so one back-forecast w0 is needed. In this case the proper back-forecast is in fact w0 /(1-θ1φ1).

The value of │V│ is a by-product of calculating W and the back-forecast. For example, if p=0 and q=1, then

V│ = (1 + θ12 + … + θ12N)

If p=1 and q=0,

V│ = 1 / (1 – φ12)

and if p=q=1,

V│ = 1 + (φ1 – θ1)2 (1 + θ12 + … + θ12N-2) / (1 – φ12)

Concentrating the likelihood over σa2 by setting σa2=S/m yields a value proportional to { │V1/m S }m/2.

The default setting of the LIKELIHOOD option is exact. In this case the concentrated likelihood is maximized, by minimizing the quantity

D = │V1/m S

which is called the deviance.

The setting leastsquares specifies that Genstat is to minimize only the sum-of-squares term S. This criterion corresponds to the back-forecasting sum-of-squares used by Box & Jenkins (1970), and will in many cases give estimates close to those of the exact likelihood. However, some discrepancy arises if the series is short or the model is close to the invertibility boundary. This is because of limitations on the back-forecasting procedure, as described in the algorithms of Box & Jenkins (1970). The deviance value D that Genstat prints is, with this setting, simply S.

When you use exact likelihood, the factor │V1/m reduces bias in the estimates of the parameter; you would get bias if you used leastsquares instead. However, │V1/m is generally close to one, unless the series is short or the model is either seasonal or close to the boundaries of invertibility or stationarity. The leastsquares setting is therefore adequate for most long, non-seasonal sets of data; using it may reduce the computation time by up to 50%. When you specify that Genstat is to estimate the parameter λ of the Box-Cox transformation, Genstat also includes the Jacobian of the transformation in the likelihood function. The result is an extra factor G-2(λ-1) in the definition of the deviance, G being the geometric mean of the data,

G = ( ∏t = 1 … N { yt } ) ** (1 / N)

Note that this is not included unless λ is being estimated, even if λ≠1.

You can treat differences in Nlog(D) as a chi-square variable in order to test nested models: this is supported by asymptotic theory, and by experience with models that have moderately large sample sizes. Similarly, you can select between different models by using Nlog(D)+2k as an information criterion, k being the number of estimated parameters. But both of these test procedures are questionable if the estimated models are close to the boundaries of invertibility or stationarity. Provided all the models that are being compared have the same orders of differencing, with the differenced series being of length m, it is recommended that mlog(D) be used rather than Nlog(D) in these tests since mlog(D) is precisely minus two multiplied by the log-likelihood as defined above.

The setting marginal is relevant mainly when TFIT is used for regression with ARIMA errors. (This requires a TRANSFERFUNCTION statement beforehand to specify the explanatory variables.) The likelihood for the model is defined as that of the univariate error series et which is defined in general by

et = ytb1x1,t – … – bmxm,t

(the xi being m explanatory variables). The constant term therefore appears in the model after any differencing of et; for example

et = c + (1 – θ1B )at

You can get bias in the estimates of the parameters of an ARIMA model because the regression is estimated at the same time. You can guard against this by specifying LIKELIHOOD=marginal. This can be particularly important if the series are short or if you use many explanatory variables (Tunnicliffe Wilson 1989). The deviance is now defined as

D = S (│XV-1X│ │V│)1/m

where m is reduced by the number of regressors (including the constant term) and the columns of X are the differenced explanatory series: the other terms are as in the exact likelihood.

You can use the marginal setting also for univariate ARIMA modelling, when the constant term is the only explanatory term. Furthermore, Genstat deals with missing values in the response variate by doing a regression on indicator variates; these too are included in the X matrix. However, you cannot use marginal likelihood and estimate a transformation parameter in either the transfer-function model or an ARIMA model. Neither can you use it if you set the FIX option in TFIT. In these cases Genstat automatically resets the LIKELIHOOD option to exact.

At every iteration with the setting LIKELIHOOD=marginal, the regression coefficients are the maximum-likelihood estimates conditional upon the estimated values of the parameters of the ARIMA model: these are also the generalized least-squares estimates, conditioned in the same way. This is so even if MAXCYCLE=0; that is, the coefficients of the regression are re-estimated even at iteration 0. Therefore you must not use the marginal setting with the option METHOD=initialize to initialize for TFORECAST. You can compare deviance values that were obtained using marginal likelihood only for models with the same explanatory variables and the same differencing structure in the error model.

Options: PRINT, LIKELIHOOD, CONSTANT, RECYCLE, WEIGHTS, MVREPLACE, FIX , METHOD, MAXCYCLE, TOLERANCE, SAVE.

Parameters: SERIES, TSM, BOXCOXMETHOD, RESIDUALS.

Action with RESTRICT

The SERIES variate can be restricted, but this must be to a contiguous set of units.

References

Box, G.E.P. & Jenkins, G.M. (1970). Time Series Analysis, Forecasting and Control. Holden-Day, San Francisco.

Tunnicliffe Wilson, G. (1989). On the use of marginal likelihood in time-series model estimation. Journal of the Royal Statistical Society, Series B, 51, 15-27.

See also

Directives: TSM, FTSM, TRANSFERFUNCTION, TDISPLAY, TFILTER, TFORECAST, TKEEP, TSUMMARIZE, CORRELATE, FOURIER.

Procedures: BJESTIMATE, BJFORECAST, BJIDENTIFY, MOVINGAVERAGE, PERIODTEST, PREWHITEN, REPPERIODOGRAM, SMOOTHSPECTRUM.

Commands for: Time series.

Example

" Example TFIT-1: Fitting a seasonal ARIMA model"

VARIATE time; VALUES=!(1...120)
FILEREAD [NAME='%gendir%/examples/TFIT-1.DAT'] apt

" Display the correlation structure of the logged data"
CALCULATE lapt = LOG(apt)
BJIDENTIFY [GRAPHICS=high; WINDOWS=!(5,6,7,8)] lapt

" Calculate the autocorrelations of the differences and seasonally
  differenced series"
CALCULATE ddslapt = DIFFERENCE(DIFFERENCE(lapt; 12); 1)
CORRELATE [PRINT=auto; MAXLAG=48] ddslapt; AUTO=ddsr

" Define a model for the series: 
  IMA(1) (that is, a model with a single moving-average parameter
          applied to the differences of the series)
  plus a seasonal IMA(1) component"
TSM [MODELTYPE=arima] airpass; ORDERS=!((0,1,1)2,12)
" Form preliminary estimates of the parameters, using a log transformation
  (BOXCOX=0 is equivalent to log)"
FTSM [PRINT=model] airpass; ddsr; BOXCOX=0
" Get the best estimates, fixing the constant"
TFIT [CONSTANT=fix] SERIES=apt; TSM=airpass

" Graph the residuals against time"
TKEEP RESID=resids
DGRAPH [WINDOW=3; KEYWINDOW=0; TITLE='Residuals vs Time'] resids; time

" Test the independence of the residuals"
CORRELATE [GRAPH=auto; MAXLAG=48] resids; TEST=S
PRINT 'Test statistic for independence of the residuals',S
Updated on June 18, 2019

Was this article helpful?