TABULATE directive

Forms summary tables of variate values.

Options

`PRINT` = string tokens	Printed output required (`counts`, `totals`, `nobservations`, `means`, `minima`, `maxima`, `variances`, `quantiles`, `sds`, `skewness`, `kurtosis`, `semeans`, `seskewness`, `sekurtosis`); default `*` i.e. no printing
`CLASSIFICATION` = factors	Factors classifying the tables; default `*` i.e. these are taken from the tables in the parameter lists
`COUNTS` = table	Saves a table counting the number of units with each factor combination; default `*`
`SEQUENTIAL` = scalar	Used for sequential formation of tables; a positive value indicates that formation is not yet complete (see `READ);` default `*`
`MARGINS` = string token	Whether the tables should be given margins if not already declared (`yes`, `no`); default `no`
`IPRINT` = string token	Whether to print the identifier of the table or the identifier of the (associated) variate that was used to form it (`identifier, extra, associatedidentifier`); default `iden`
`WEIGHTS` = variate	Weights to be used in the tabulations; default `*` indicates that all units have weight 1
`PERCENTQUANTILES` = scalar or variate	Percentage points for which quantiles are required; default 50 (i.e. median)
`OWN` = scalar or variate	Specifies option settings for the `OWNTAB` subroutine and indicates that this is to supply the data values instead of the variates in the `DATA` list; default `*`
`OWNFACTORS` = factors	Factors whose values are to be read by `OWNTAB` (must include the factors of the classification set); default `*`
`OWNVARIATES` = variates	Variates whose values are to be read by `OWNTAB` (must include the `DATA` variates); default `*`
`INCHANNEL` = scalar	Channel number of the file from which the `OWNTAB` subroutine is to read the data (previously opened by an `OPEN` statement)
`INFILETYPE` = string token	Type of the `OWN` data file (`input, unformatted`); default `inpu`

Parameters

`DATA` = variates	Data values to be tabulated
`TOTALS` = tables	Tables to contain totals
`NOBSERVATIONS` = tables	Tables containing the numbers of non-missing values in each cell
`MEANS` = tables	Tables of means
`MINIMA` = tables	Tables of minimum values in each cell
`MAXIMA` = tables	Tables of maximum values in each cell
`VARIANCES` = tables	Tables of cell variances
`QUANTILES` = tables or pointers	Table to contain quantiles at a single `PERCENTQUANTILE` or pointer of tables for several `PERCENTQUANTILEs` (not available for sequential or `OWN` tabulation)
`SDS` = tables	Tables of standard deviations
`SKEWNESS` = tables	Tables of skewness coefficients
`KURTOSIS` = tables	Tables of kurtosis coefficients
`SEMEANS` = tables	Tables of standard errors of means
`SESKEWNESS` = tables	Tables of standard errors of skewness coefficients
`SEKURTOSIS` = tables	Tables of standard errors of kurtosis coefficients

Description

TABULATE allows you to produce the various types of tabular summary listed in the settings of its PRINT option. The variates whose values are to be summarized are listed with the DATA parameter. If you want to save the summaries in tables, for manipulating or for printing later on, you should list identifiers of the tables in the appropriate parameter list: for example, you would save the totals in a table T by including T in the list for the TOTALS parameter. The other parameters similarly give the other kinds of summary: numbers of non-missing values, means, minima, maxima, variances, quantiles, standard deviations, skewness, kurtosis and (within-cell) standard errors of means, skewness or kurtosis.

If you specify less tables in the lists than the number of DATA variates, Genstat produces accumulated summaries. For example, with

TABULATE Sales2001,Costs2001,Sales2002,Costs2002;\

TOTALS=Totalsales,Totalcosts

the TOTALS list is recycled. So Totalsales will correspond to Sales2001 and Sales2002, and accumulate the totals from both variates. Similarly Totalcost will contain the totals from the variates Costs2001 and Costs2002. To avoid confusion, however, you are not allowed to specify table lists with differing lengths.

The simplest quantile, and the one produced by default, is the median (50% quantile), but the PERCENTQUANTILE option allows you to request any percentage point (between 0 and 100, of course). Moreover, by specifying a variate as the setting for PERCENTQUANTILE, you can obtain several quantiles at the same time. However, if you then want to save the results the setting of the QUANTILE parameter must be a pointer with length equal to the required number of quantiles, instead of a single table.

If you merely want to print the summaries, you do not usually need to list any tables; you need only specify the PRINT option. The only exception to this is with sequential tabulation, described at the end of this subsection.

The CLASSIFICATION option defines the classifying factors for the tables. This need not be set if at least one of the tables has already been declared (but then all the declared tables must have the same classifying factors). The MARGINS option determines whether or not the tables will have margins, if none have already been declared (and those that have been declared must be either all with margins or all without margins).

In the tables that correspond to the parameters of TABULATE, missing values of the data variates are ignored. So the NOBSERVATIONS parameter and the nobservations setting of the PRINT option provide the numbers of non-missing units of the data variates for each factor combination. You can however obtain a count of the numbers of units that would have contributed to each group if no values had been missing: you use the COUNTS option if you want to save the table, or put PRINT=counts if you want to print it. If any of the factor values are missing Genstat ascribes the corresponding units to the unknown cell associated with the table (see the TABLE directive).

If there are no observations in one of the groups, the corresponding cell will be zero in a table of numbers of observations or counts; in a table of totals, means, minima, maxima, variances, standard deviations, skewness, kurtosis or standard errors of means, skewness or kurtosis the cell will contain a missing value.

Weighted tables can be obtained by setting the WEIGHT option to a variate of weights. You can, in general, think of weights as a set of multipliers which are applied to the data before any operations are performed. Thus, for most aspects of weighted tabulation you can replace x by wx and 1 by w (that is, n by Σw) in the standard formulae; see the table below. This is not what happens in the case of variances, standard deviations (which are square roots of the variances) and quantiles, but it is true for the other functions (including counts).

	Unweighted	Weighted
Count	n	Σ w
Total	Σ x	Σ wx
Nobservations	n	Σ w (x not missing)
Mean	Σ x/n	Σ wx / Σ w
Minimum	Min( x )	Min( wx )
Maximum	Max( x )	Max( wx )
Variance	Σ(x – (Σx/n))² / n-1	Σw(x – (Σwx/Σw))² / Σ w-1
Skewness	Σ(x – (Σx/n))³ / ( Σ(x – (Σx/n))² )^3/2	Σ w (x – (Σwx/Σw))³ / ( Σ w (x – (Σwx/Σw))² )^3/2
Kurtosis	Σ(x – (Σx/n))⁴ / ( Σ(x – (Σx/n))² )² – 3	Σ w (x – (Σwx/Σw))⁴ / Σ w (x – (Σwx/Σw))² )² – 3
s.e. skewness	√( { 6n × (n-1) } / { (n-2) × (n+1) × (n+3) } )	√( { 6Σw × (Σw – 1) } / { (Σw – 2) × (Σw + 1) × (Σw + 3) } ) (x not missing)
s.e. kurtosis	√( { 24 × n × (n-1)² } / { (n-2) × (n-3) × (n+5) × (n+3) } )	√( { 24 × Σw × (Σw – 1)² } / { (Σw – 2) × (Σw – 3) × (Σw + 5) × (Σw + 3) } ) (x not missing)

A quick look at the formula used for the weighted variance (or the standard deviation) or skewness or kurtosis shows that it breaks down for Σw<1; in fact it is valid only when the weights are integer values greater than or equal to zero. Similarly, with quantiles the weights are assumed to specify replicated observations; so these must also be non-negative integers. If an invalid weight is found during the calculation of a variance, skewness, kurtosis or quantile a fault will be reported. Temporary tables will be deleted, but named tables may contain partial results. However, non-integer weights are allowed in other contexts. The standard deviation is the square root of the variance, and the standard error of the mean is the standard deviation divided by the square root of the number of observations.

If you have many observations to summarize, there may be insufficient space within Genstat for you to read them all and then form the tables. To cater for such situations, Genstat allows you to process the data in sections, using the SEQUENTIAL option of TABULATE in conjunction with the SEQUENTIAL option of READ. After READ, the absolute value of the option indicates the number of units that have been read in this particular section; the value is positive during interim sections and negative or zero once the terminator at the end of the data is reached. TABULATE will not print any tables until the final section has been processed. If you want to see the intermediate tables, you can include a PRINT statement after the TABULATE statement. To allow Genstat to keep contact with the working tables in which the results are accumulating, you must save at least one out of the various types of table for every DATA variate. Genstat can then link the working tables to this named table during the course of the sequential tabulation, so that the information is not lost between the successive uses of TABULATE.

The final five options of TABULATE (OWN, OWNFACTORS, OWNVARIATES, INCHANNEL and INFILETYPE) allow you to link your own Fortran subroutine, G5XZIT, to Genstat to allow you to handle complicated arrangements of data, as can occur for example in hierarchical surveys. To implement this, you must get access to some of the Genstat source code. The relevant section of the code is named Module X, and is distributed with Genstat to all sites, probably in a file called X.FOR. The documentation of G5XZIT is included with the Fortran and so is not repeated here. G5XZIT is thus a Fortran subprogram, to be modified by you, which is called from within TABULATE for each unit to be tabulated. It contains switches to tell TABULATE when a data error occurs or when all the data have been read. To use it you have to link your own version of Genstat, as when using the OWN directive. Then your version of G5XZIT will be used instead of the standard version supplied as part of Genstat.

The subprogram can be as simple or as complicated as you like (or need), provided it obeys a few simple rules. A very simple version, reading two variates and two factors, is supplied with Genstat. This should provide sufficient information for you to write your own version, and link it into your own private version of Genstat.

The OWN option should be set to a variate allowing you to communicate between your Genstat code and your G5XZIT subprogram. The OWNFACTORS option provides the list of factors to be read by G5XZIT. It must include the classifying factors needed in the current TABULATE instruction, but it may contain others as well. The OWNVARIATES option should provide a similar list of variates. The INCHANNEL option should be set to the Genstat channel number of the data file, as specified in a previous OPEN statement or in the Genstat command line. The INFILETYPE option specifies whether the data file is character (input) or binary (unformatted).

TABULATE allows only one classification set to be used at a time. If the data set is complicated enough to require G5XZIT, then several tabulations with different classifying sets are likely to be needed. Rather than have a separate branch in G5XZIT for each tabulation, you can put all the factors and all the variates that you will need into the settings of the OWNFACTORS and OWNVARIATES options, and leave TABULATE to extract the ones it needs each time. If you have several TABULATE statements as suggested, you will have to close the data file and re-open it between them.

Options: PRINT, CLASSIFICATION, COUNTS, SEQUENTIAL, MARGINS, IPRINT, WEIGHTS, PERCENTQUANTILES, OWN, OWNFACTORS, OWNVARIATES, INCHANNEL, INFILETYPE.

Parameters: DATA, TOTALS, NOBSERVATIONS, MEANS, MINIMA, MAXIMA, VARIANCES, QUANTILES, SDS, SKEWNESS, KURTOSIS, SEMEANS, SESKEWNESS, SEKURTOSIS.

Action with `RESTRICT`

If any of the DATA variates, or the WEIGHTS variate, or any of the classifying factors of the tables is restricted, TABULATE will form the tables using only the defined subset of units. If more than one variate or factor is restricted, the restrictions must be the same.

Example

" Examples 1:4.11.1a-b, 1:4.11.3a-b, 1:4.11.6a-b, 1:4.11.7 "
VARIATE [NVALUES=15] Quantity,Charge
FACTOR [NVALUES=15; LABELS=!T(A,B)] Type
& [LABELS=!T(London,Manchester,Birmingham,Bristol)] Town
READ [PRINT=data,errors] Town,Quantity,Type; FREPRESENTATION=labels
    London 10 A  Manchester   5 B  Birmingham  10 B     Bristol 25 A  
Manchester 10 *  Birmingham 100 B      London 200 B  Manchester 25 A  
   Bristol 50 A  Birmingham  25 A     Bristol  25 B      London 25 A 
    London 50 B  Manchester  25 B      London  50 A  :
TABLE [CLASSIFICATION=Town,Type] Totdisp; UNKNOWN=Udisp
TABULATE Quantity; TOTALS=Totdisp
PRINT Totdisp; DECIMALS=0
READ [PRINT=data,errors] Charge
10 20 15 15 * 60 80 30 25 15 25 15 40 * * :
TABULATE [CLASSIFICATION=Town; COUNTS=Nconsign] DATA=Charge; \
  TOTALS=Payment; NOBSERVATIONS=Invoices
PRINT Nconsign,Invoices,Payment; DECIMALS=0,0,2
PERCENT Totdisp; NEWTABLE=Totdisp%
PRINT [PUNKNOWN=never] Totdisp%; DECIMALS=2
T%CONTROL Totdisp; FACTOR=Town; CONTROL='London'
TABSORT [PRINT=tables; DIRECTION=descending] Invoices; DECIMALS=0
POINTER [VALUES=Town,Type] Classification
POINTER [VALUES=SortedTowns,SortedTypes] Newclassification
TABSORT [DIRECTION=descending; FACTORS=Classification;\
        NEWFACTORS=Newclassification] Totdisp%; Sorted%Totals
PRINT   Sorted%Totals
VARIATE [VALUES=1...4] Newx
DTABLE  Sorted%Totals; XFACTOR=SortedTowns; NEWXLEVELS=Newx;\
        GROUPS=SortedTypes

Updated on January 12, 2022

Was this article helpful?

Yes No