TFIT directive

Estimates parameters in Box-Jenkins models for time series.

Options

`PRINT` = string tokens	What to print (`model`, `summary`, `estimates`, `correlations`, `monitoring`); default `mode,summ,esti`
`LIKELIHOOD` = string token	Method of likelihood calculation (`exact`, `leastsquares`, `marginal`); default `exac`
`CONSTANT` = string token	How to treat the constant (`estimate`, `fix`); default `esti`
`RECYCLE` = string token	Whether to continue from previous estimation (`yes`, `no`); default `no`
`WEIGHTS` = variate	Weights; default `*`
`MVREPLACE` = string token	Whether to replace missing values by their estimates (`yes`, `no`); default `no`
`FIX` = variate	Defines constraints on parameters (ordered as in each model, tf models first): zeros fix parameters, parameters with equal numbers are constrained to be equal; default `*`
`METHOD` = string token	Whether to carry out full iterative estimation, to carry out just one iterative step, to perform no steps but still give parameter standard deviations, or only to initialize for forecasting by regenerating residuals (`full`, `onestep`, `zerostep`, `initialize`); default `full`
`MAXCYCLE` = scalar	Maximum number of iterations; default 15
`TOLERANCE` = scalar	Criterion for convergence; default 0.0004
`SAVE` = identifier	To name save structure, or supply save structure with transfer-functions; default `*` i.e. transfer-functions taken from the latest model

Parameters

`SERIES` = variate	Time series to be modelled (output series)
`TSM` = TSM	Model for output series
`BOXCOXMETHOD` = string token	How to treat transformation parameter in output series (`fix`, `estimate`); default `fix`
`RESIDUALS` = variate	To save residual series

Description

The main use of TFIT is to fit parameters to time-series models, although you can also use it to initialize for the TFORECAST directive, even when the model parameters are already known. TFIT was originally called ESTIMATE, but was renamed in Release 14 to emphasize its status as a time-series command. The earlier name (ESTIMATE) was retained to allow previous programs to continue to run, but this may be removed in a future release.

You need to define a TSM structure before using TFIT, to provide the setting for the TSM parameter. You may also wish to give a TRANSFERFUNCTION statement, for example if you wish to specify explanatory variables for regression with ARIMA errors, or to define transfer-function models. In many applications of estimating a univariate ARIMA model, you will need only a simple form of the directive, such as:

TFIT Daylength; TSM=Erp

The SERIES parameter specifies the variate holding the time series data to which the model is to be fitted.

The TSM parameter specifies the ARIMA model that is to be fitted to the time-series data. This TSM must already have been declared and its ORDERS must have been set. If the LAGS parameter of the TSM has been set, the lags must have been given values. However, if the PARAMETERS of the TSM model have been set, these need not have been declared previously nor given values. When the parameter values are not set, default values are used: these are all zero, except for the transformation parameter, which is set to 1.0 if it is not to be estimated (see BOXCOXMETHOD and FIX below). Any parameter values that you do specify will be used as initial values for the parameters in the model; Genstat replaces any missing values by the default values. If any group of autoregressive or moving-average parameters do not satisfy the required conditions for stationarity or invertibility, all the parameters to be estimated are reset by Genstat to the default values. After TFIT, the parameters of the TSM contain the estimated parameter values.

The BOXCOXMETHOD parameter allows you to estimate the transformation parameter λ.

The RESIDUALS parameter saves the estimated innovations (or residuals). The residuals are calculated for t=t₀…N, where t₀=1+p+d–q for a simple ARIMA model. If t₀>1, missing values will be inserted for t=1…t₀-1.

The PRINT option controls printed output. If you specify monitoring, then at each cycle of the iterative process of estimation, Genstat prints the deviance for the current fitted model, together with the current estimates of model parameters. The format is simple with the minimum of description, to let you judge easily how quickly the process is converging. The other settings of PRINT control output at the end of the iterative process. If you specify model, the model is briefly described, giving the identifier of the series and the time-series model, together with the orders of the model. If you specify summary, the deviance of the final model is printed, along with the residual number of degrees of freedom. If you specify estimates, the estimates of the model parameter are printed in a descriptive format, together with their estimated standard errors and reference numbers. If you specify correlations, the correlations between estimates of parameters are printed, with reference numbers to identify the parameters.

The LIKELIHOOD option specifies the criterion that Genstat minimizes to obtain the estimates of the parameters: this is described in the next section. The default setting exact is recommended for most applications.

You can use the CONSTANT option to specify whether Genstat is to estimate the constant term c in the model. If CONSTANT=fix, the constant is held at the value given in the initial parameter values; this need not be zero.

The RECYCLE option allows a previous TFIT statement to continue; this can save computing time. If RECYCLE=yes, the most recent TFIT statement is continued, unless the SAVE option has been set to the save structure from some other TFIT statement. The SERIES and TSM settings are then taken from this previous TFIT statement: Genstat ignores any specified in the current statement. Most of the settings of other parameters and options are carried over from the previous statement, and new values are ignored. However, there are some exceptions. You can change the RESIDUALS variate, you can reset MAXCYCLE to the number of further iterations you require, and you can change the settings of TOLERANCE and PRINT. You can also change the values of the variate in the WEIGHTS option; you can thus get reweighted estimation. You can change the values of the SERIES itself, although you cannot change missing values; if the MVREPLACE option was previously set to yes, you must put the original missing values back into the SERIES variate before the new TFIT statement.

The WEIGHTS option includes in the likelihood a weighted sum-of-squares term

∑_{t = t0 … N} { w_t a_t² }

where w_t, t=1…N are provided by the WEIGHTS variate. The values of w_t must be strictly positive. If t₀<1, where t₀=1+d+p–q, then w_t is taken as 1 for t<1.

The MVREPLACE option allows you to request any missing values in the time-series to be replaced by their estimates after estimation. Genstat will always estimate the missing values, irrespective of the setting of MVREPLACE; so you can also obtain these estimates later from TKEEP.

The FIX option allows you to place simple constraints on parameter values throughout the estimation. The units of the FIX variate correspond to the parameters of the TSM, excluding the innovation variance. The values of the FIX variate are used to define the parameter constraints and must be integers. If an element of the FIX variate is set to 0, the corresponding parameter is constrained to remain at its initial setting. If an element is not 0, and the value is unique in the FIX variate, the parameter is estimated without any special constraint. If two or more values are equal, the corresponding parameters are constrained to be equal throughout the estimation. The number that you give to a parameter by FIX will appear as the reference number of the parameter in the printed model and correlation matrix. This option overrides any setting of CONSTANT and BOXCOXMETHOD.

The MAXCYCLE option specifies the maximum number of iterations to be performed.

The TOLERANCE option specifies the convergence criterion. Genstat decides that convergence has occurred if the fractional reduction in the deviance in successive iterations is less than the specified value, provided also that the search is not encountering numerical difficulties that force the step length in the parameter space to be severely limited. You can use monitoring to judge whether, for all practical purposes, the iterations have converged. Genstat gives warnings if the specified number of iterations is completed without convergence, or if the search procedure fails to find a reduced value of the deviance despite a very short step length. Such an outcome may be due to complexities in the likelihood function that make the search difficult, but can be due to your specifying too small a value for TOLERANCE.

The SAVE option allows you to save the time-series save structure produced by TFIT. You can use this in further TFIT statements with RECYCLE=yes, or in TFORECAST statements. It can also be used by the TDISPLAY and TKEEP directives. Genstat automatically saves the structure from the most recent TFIT statement, but this is over-written when the next TFIT statement is executed, unless you have used SAVE to give it an identifier of its own. You can access the current time-series save structure by the SPECIAL option of the GET directive, and reset it by the TSAVE option of the SET directive.

The METHOD option has four possible settings. The default setting is full which gives the usual estimation to convergence or until the maximum number of iterations has been reached.

With the setting METHOD=initialize, TFIT carries out only the residual regeneration steps (that is, calculation of a_t for t=t₀…N) which are needed before TFORECAST can be used. If the model has just been estimated using the default full setting, this is unnecessary. The setting initialize is useful when the time series is supplied with a known model and a minimal amount of calculation is wanted to prepare or initialize for forecasting. None of the model parameters are changed, and no standard errors of parameter estimates are available. Missing values in the series are estimated so this setting provides an efficient way of getting their values when the time series model is known; they can then be obtained using TKEEP. The deviance value is also available from TKEEP. This setting is therefore useful for efficient calculation of deviance values when you want to plot the shape of the deviance as a function of parameter values.

With the setting METHOD=zerostep the effect is the same as for initialize except that TFIT also calculates the standard errors of the parameters as if they had just been estimated. These can be used together with other quantities available from TKEEP to construct confidence intervals and carry out tests on the parameter values, which remain unchanged except that the innovation variance in the ARIMA model is replaced by its estimate conditional on all other parameters.

The setting METHOD=onestep gives the same results as specifying the option MAXCYCLE=1 in TFIT. It is convenient for carrying out quick tests of model parameters.

To explain the LIKELIHOOD option, we need to describe the estimation of ARIMA models in more detail. You may want to skip this if you are doing fairly routine work.

The first step in deriving the likelihood for a simple model is to calculate

w_t = ∇^dy_t – c ,t = 1+d … N

This has a multivariate Normal distribution with dispersion matrix Vσ_a², where V depends only on the autoregressive and moving-average parameters. The likelihood is then proportional to

{ σ_a^2m│V│ }^-½ exp{ –w′V^-1w/2σ_a² }

where m=N–d. In practice Genstat evaluates this by using the formula

w′ V^-1 w = W + ∑_{t = t0 … N} { a_t² } = S

where t₀=1+d+p–q. The term W is a quadratic form in the p values w_1+d–q … w_p_+d–q: it takes account of the starting-value problem for regenerating the innovations a_t, and avoids losing information as would happen if the process used only a conditional sum-of-squares function. If q>0, Genstat introduces unobserved values of w_1+d–q … w_d in order to calculate the sum S. Genstat uses linear least-squares to calculate these q starting values for w, thus minimizing S. We shall call them back-forecasts, though if p>0 they are actually computationally convenient linear functions of the proper back-forecasts. We shall call S the sum-of-squares function: it is the sum of the quadratic form and the sum-of-squares term, and is identical to the value expressed by Box & Jenkins (1970) as

∑_{t = -∞ … N} { a_t² }

using infinite back-forecasting; that is, using:

W = ∑_{t = -∞ … t0-1} { a_t² }

The values a_t for t=t₀…N agree precisely with those of Box and Jenkins.

To clarify all this, consider examples with no differencing; that is, d=0. If p=0 and q=1 then W=0 and t₀=0, and one back-forecast w₀ is introduced. If p=1 and q=0 then W=(1-φ₁²)w₁² and t₀=2, and no back-forecasts are needed. If p=q=1 then W=(1-φ₁²)w₀² and t₀=1, and so one back-forecast w₀ is needed. In this case the proper back-forecast is in fact w₀/(1-θ₁φ₁).

The value of │V│ is a by-product of calculating W and the back-forecast. For example, if p=0 and q=1, then

│V│ = (1 + θ₁² + … + θ₁^2N)

If p=1 and q=0,

│V│ = 1 / (1 – φ₁²)

and if p=q=1,

│V│ = 1 + (φ₁ – θ₁)² (1 + θ₁² + … + θ₁^2N-2) / (1 – φ₁²)

Concentrating the likelihood over σ_a² by setting σ_a²=S/m yields a value proportional to { │V│^1/mS }^–m/2.

The default setting of the LIKELIHOOD option is exact. In this case the concentrated likelihood is maximized, by minimizing the quantity

D = │V│^1/m S

which is called the deviance.

The setting leastsquares specifies that Genstat is to minimize only the sum-of-squares term S. This criterion corresponds to the back-forecasting sum-of-squares used by Box & Jenkins (1970), and will in many cases give estimates close to those of the exact likelihood. However, some discrepancy arises if the series is short or the model is close to the invertibility boundary. This is because of limitations on the back-forecasting procedure, as described in the algorithms of Box & Jenkins (1970). The deviance value D that Genstat prints is, with this setting, simply S.

When you use exact likelihood, the factor │V│^1/m reduces bias in the estimates of the parameter; you would get bias if you used leastsquares instead. However, │V│^1/m is generally close to one, unless the series is short or the model is either seasonal or close to the boundaries of invertibility or stationarity. The leastsquares setting is therefore adequate for most long, non-seasonal sets of data; using it may reduce the computation time by up to 50%. When you specify that Genstat is to estimate the parameter λ of the Box-Cox transformation, Genstat also includes the Jacobian of the transformation in the likelihood function. The result is an extra factor G^-2(λ-1) in the definition of the deviance, G being the geometric mean of the data,

G = ( ∏_{t = 1 … N} { y_t } ) ** (1 / N)

Note that this is not included unless λ is being estimated, even if λ≠1.

You can treat differences in Nlog(D) as a chi-square variable in order to test nested models: this is supported by asymptotic theory, and by experience with models that have moderately large sample sizes. Similarly, you can select between different models by using Nlog(D)+2k as an information criterion, k being the number of estimated parameters. But both of these test procedures are questionable if the estimated models are close to the boundaries of invertibility or stationarity. Provided all the models that are being compared have the same orders of differencing, with the differenced series being of length m, it is recommended that mlog(D) be used rather than Nlog(D) in these tests since mlog(D) is precisely minus two multiplied by the log-likelihood as defined above.

The setting marginal is relevant mainly when TFIT is used for regression with ARIMA errors. (This requires a TRANSFERFUNCTION statement beforehand to specify the explanatory variables.) The likelihood for the model is defined as that of the univariate error series e_t which is defined in general by

e_t = y_t – b₁x_1,t – … – b_mx_m_,t

(the x_i being m explanatory variables). The constant term therefore appears in the model after any differencing of e_t; for example

∇e_t = c + (1 – θ₁B )a_t

You can get bias in the estimates of the parameters of an ARIMA model because the regression is estimated at the same time. You can guard against this by specifying LIKELIHOOD=marginal. This can be particularly important if the series are short or if you use many explanatory variables (Tunnicliffe Wilson 1989). The deviance is now defined as

D = S (│X′V^-1X│ │V│)^1/m

where m is reduced by the number of regressors (including the constant term) and the columns of X are the differenced explanatory series: the other terms are as in the exact likelihood.

You can use the marginal setting also for univariate ARIMA modelling, when the constant term is the only explanatory term. Furthermore, Genstat deals with missing values in the response variate by doing a regression on indicator variates; these too are included in the X matrix. However, you cannot use marginal likelihood and estimate a transformation parameter in either the transfer-function model or an ARIMA model. Neither can you use it if you set the FIX option in TFIT. In these cases Genstat automatically resets the LIKELIHOOD option to exact.

At every iteration with the setting LIKELIHOOD=marginal, the regression coefficients are the maximum-likelihood estimates conditional upon the estimated values of the parameters of the ARIMA model: these are also the generalized least-squares estimates, conditioned in the same way. This is so even if MAXCYCLE=0; that is, the coefficients of the regression are re-estimated even at iteration 0. Therefore you must not use the marginal setting with the option METHOD=initialize to initialize for TFORECAST. You can compare deviance values that were obtained using marginal likelihood only for models with the same explanatory variables and the same differencing structure in the error model.

Options: PRINT, LIKELIHOOD, CONSTANT, RECYCLE, WEIGHTS, MVREPLACE, FIX , METHOD, MAXCYCLE, TOLERANCE, SAVE.

Parameters: SERIES, TSM, BOXCOXMETHOD, RESIDUALS.

Action with `RESTRICT`

The SERIES variate can be restricted, but this must be to a contiguous set of units.

References

Box, G.E.P. & Jenkins, G.M. (1970). Time Series Analysis, Forecasting and Control. Holden-Day, San Francisco.

Tunnicliffe Wilson, G. (1989). On the use of marginal likelihood in time-series model estimation. Journal of the Royal Statistical Society, Series B, 51, 15-27.

Example

" Example TFIT-1: Fitting a seasonal ARIMA model"

VARIATE time; VALUES=!(1...120)
FILEREAD [NAME='%gendir%/examples/TFIT-1.DAT'] apt

" Display the correlation structure of the logged data"
CALCULATE lapt = LOG(apt)
BJIDENTIFY [GRAPHICS=high; WINDOWS=!(5,6,7,8)] lapt

" Calculate the autocorrelations of the differences and seasonally
  differenced series"
CALCULATE ddslapt = DIFFERENCE(DIFFERENCE(lapt; 12); 1)
CORRELATE [PRINT=auto; MAXLAG=48] ddslapt; AUTO=ddsr

" Define a model for the series: 
  IMA(1) (that is, a model with a single moving-average parameter
          applied to the differences of the series)
  plus a seasonal IMA(1) component"
TSM [MODELTYPE=arima] airpass; ORDERS=!((0,1,1)2,12)
" Form preliminary estimates of the parameters, using a log transformation
  (BOXCOX=0 is equivalent to log)"
FTSM [PRINT=model] airpass; ddsr; BOXCOX=0
" Get the best estimates, fixing the constant"
TFIT [CONSTANT=fix] SERIES=apt; TSM=airpass

" Graph the residuals against time"
TKEEP RESID=resids
DGRAPH [WINDOW=3; KEYWINDOW=0; TITLE='Residuals vs Time'] resids; time

" Test the independence of the residuals"
CORRELATE [GRAPH=auto; MAXLAG=48] resids; TEST=S
PRINT 'Test statistic for independence of the residuals',S

Updated on June 18, 2019

Was this article helpful?

Yes No