Forms summary tables of variate values.

### Options

### Parameters

`DATA` = variates |
Data values to be tabulated |
---|---|

`TOTALS` = tables |
Tables to contain totals |

`NOBSERVATIONS` = tables |
Tables containing the numbers of non-missing values in each cell |

`MEANS` = tables |
Tables of means |

`MINIMA` = tables |
Tables of minimum values in each cell |

`MAXIMA` = tables |
Tables of maximum values in each cell |

`VARIANCES` = tables |
Tables of cell variances |

`QUANTILES` = tables or pointers |
Table to contain quantiles at a single `PERCENTQUANTILE` or pointer of tables for several `PERCENTQUANTILEs` (not available for sequential or `OWN` tabulation) |

`SDS` = tables |
Tables of standard deviations |

`SKEWNESS` = tables |
Tables of skewness coefficients |

`KURTOSIS` = tables |
Tables of kurtosis coefficients |

`SEMEANS` = tables |
Tables of standard errors of means |

`SESKEWNESS` = tables |
Tables of standard errors of skewness coefficients |

`SEKURTOSIS` = tables |
Tables of standard errors of kurtosis coefficients |

### Description

`TABULATE`

allows you to produce the various types of tabular summary listed in the settings of its `PRINT`

option. The variates whose values are to be summarized are listed with the `DATA`

parameter. If you want to save the summaries in tables, for manipulating or for printing later on, you should list identifiers of the tables in the appropriate parameter list: for example, you would save the totals in a table `T`

by including `T`

in the list for the `TOTALS`

parameter. The other parameters similarly give the other kinds of summary: numbers of non-missing values, means, minima, maxima, variances, quantiles, standard deviations, skewness, kurtosis and (within-cell) standard errors of means, skewness or kurtosis.

If you specify less tables in the lists than the number of `DATA`

variates, Genstat produces accumulated summaries. For example, with

`TABULATE Sales2001,Costs2001,Sales2002,Costs2002;\`

` TOTALS=Totalsales,Totalcosts`

the `TOTALS`

list is recycled. So `Totalsales`

will correspond to `Sales2001`

and `Sales2002`

, and accumulate the totals from both variates. Similarly `Totalcost`

will contain the totals from the variates Costs2001 and Costs2002. To avoid confusion, however, you are not allowed to specify table lists with differing lengths.

The simplest quantile, and the one produced by default, is the median (50% quantile), but the `PERCENTQUANTILE`

option allows you to request any percentage point (between 0 and 100, of course). Moreover, by specifying a variate as the setting for `PERCENTQUANTILE`

, you can obtain several quantiles at the same time. However, if you then want to save the results the setting of the `QUANTILE`

parameter must be a pointer with length equal to the required number of quantiles, instead of a single table.

If you merely want to print the summaries, you do not usually need to list any tables; you need only specify the `PRINT`

option. The only exception to this is with sequential tabulation, described at the end of this subsection.

The `CLASSIFICATION`

option defines the classifying factors for the tables. This need not be set if at least one of the tables has already been declared (but then all the declared tables must have the same classifying factors). The `MARGINS`

option determines whether or not the tables will have margins, if none have already been declared (and those that have been declared must be either all with margins or all without margins).

In the tables that correspond to the parameters of `TABULATE`

, missing values of the data variates are ignored. So the `NOBSERVATIONS`

parameter and the `nobservations`

setting of the `PRINT`

option provide the numbers of non-missing units of the data variates for each factor combination. You can however obtain a count of the numbers of units that would have contributed to each group if no values had been missing: you use the `COUNTS`

option if you want to save the table, or put `PRINT=counts`

if you want to print it. If any of the factor values are missing Genstat ascribes the corresponding units to the *unknown* cell associated with the table (see the `TABLE`

directive).

If there are no observations in one of the groups, the corresponding cell will be zero in a table of numbers of observations or counts; in a table of totals, means, minima, maxima, variances, standard deviations, skewness, kurtosis or standard errors of means, skewness or kurtosis the cell will contain a missing value.

Weighted tables can be obtained by setting the `WEIGHT`

option to a variate of weights. You can, in general, think of weights as a set of multipliers which are applied to the data before any operations are performed. Thus, for most aspects of weighted tabulation you can replace *x* by *wx* and 1 by *w* (that is, *n* by Σ*w*) in the standard formulae; see the table below. This is not what happens in the case of variances, standard deviations (which are square roots of the variances) and quantiles, but it is true for the other functions (including counts).

Unweighted | Weighted | |

Count | n |
Σ w |

Total | Σ x |
Σ wx |

Nobservations | n |
Σ w (x not missing) |

Mean | Σ x/n |
Σ wx / Σ w |

Minimum | Min( x ) |
Min( wx ) |

Maximum | Max( x ) |
Max( wx ) |

Variance | Σ(x – (Σx/n))^{2} / n-1 |
Σw(x – (Σwx/Σw))^{2} / Σ w-1 |

Skewness |
Σ( / ( Σ( |
Σ / ( Σ |

Kurtosis |
Σ( / ( Σ( |
Σ / Σ |

s.e. skewness |
√( { 6 / { ( |
√( { 6Σ / { (Σ (Σ ( |

s.e. kurtosis |
√( { 24 × / { ( ( |
√( { 24 × Σ / { (Σ (Σ ( |

A quick look at the formula used for the weighted variance (or the standard deviation) or skewness or kurtosis shows that it breaks down for Σ*w*<1; in fact it is valid only when the weights are integer values greater than or equal to zero. Similarly, with quantiles the weights are assumed to specify replicated observations; so these must also be non-negative integers. If an invalid weight is found during the calculation of a variance, skewness, kurtosis or quantile a fault will be reported. Temporary tables will be deleted, but named tables may contain partial results. However, non-integer weights are allowed in other contexts. The standard deviation is the square root of the variance, and the standard error of the mean is the standard deviation divided by the square root of the number of observations.

If you have many observations to summarize, there may be insufficient space within Genstat for you to read them all and then form the tables. To cater for such situations, Genstat allows you to process the data in sections, using the `SEQUENTIAL`

option of `TABULATE`

in conjunction with the `SEQUENTIAL`

option of `READ`

. After `READ`

, the absolute value of the option indicates the number of units that have been read in this particular section; the value is positive during interim sections and negative or zero once the terminator at the end of the data is reached. `TABULATE`

will not print any tables until the final section has been processed. If you want to see the intermediate tables, you can include a `PRINT`

statement after the `TABULATE`

statement. To allow Genstat to keep contact with the working tables in which the results are accumulating, you must save at least one out of the various types of table for every `DATA`

variate. Genstat can then link the working tables to this named table during the course of the sequential tabulation, so that the information is not lost between the successive uses of `TABULATE`

.

The final five options of `TABULATE`

(`OWN`

, `OWNFACTORS`

, `OWNVARIATES`

, `INCHANNEL`

and `INFILETYPE`

) allow you to link your own Fortran subroutine, `G5XZIT`

, to Genstat to allow you to handle complicated arrangements of data, as can occur for example in hierarchical surveys. To implement this, you must get access to some of the Genstat source code. The relevant section of the code is named Module X, and is distributed with Genstat to all sites, probably in a file called X.FOR. The documentation of G5XZIT is included with the Fortran and so is not repeated here. G5XZIT is thus a Fortran subprogram, to be modified by you, which is called from within `TABULATE`

for each unit to be tabulated. It contains switches to tell `TABULATE`

when a data error occurs or when all the data have been read. To use it you have to link your own version of Genstat, as when using the `OWN`

directive. Then your version of G5XZIT will be used instead of the standard version supplied as part of Genstat.

The subprogram can be as simple or as complicated as you like (or need), provided it obeys a few simple rules. A very simple version, reading two variates and two factors, is supplied with Genstat. This should provide sufficient information for you to write your own version, and link it into your own private version of Genstat.

The `OWN`

option should be set to a variate allowing you to communicate between your Genstat code and your G5XZIT subprogram. The `OWNFACTORS`

option provides the list of factors to be read by G5XZIT. It must include the classifying factors needed in the current `TABULATE`

instruction, but it may contain others as well. The `OWNVARIATES`

option should provide a similar list of variates. The `INCHANNEL`

option should be set to the Genstat channel number of the data file, as specified in a previous `OPEN`

statement or in the Genstat command line. The `INFILETYPE`

option specifies whether the data file is character (input) or binary (unformatted).

`TABULATE`

allows only one classification set to be used at a time. If the data set is complicated enough to require G5XZIT, then several tabulations with different classifying sets are likely to be needed. Rather than have a separate branch in G5XZIT for each tabulation, you can put all the factors and all the variates that you will need into the settings of the `OWNFACTORS`

and `OWNVARIATES`

options, and leave `TABULATE`

to extract the ones it needs each time. If you have several `TABULATE`

statements as suggested, you will have to close the data file and re-open it between them.

Options: `PRINT`

, `CLASSIFICATION`

, `COUNTS`

, `SEQUENTIAL`

, `MARGINS`

, `IPRINT`

, `WEIGHTS`

, `PERCENTQUANTILES`

, `OWN`

, `OWNFACTORS`

, `OWNVARIATES`

, `INCHANNEL`

, `INFILETYPE`

.

Parameters: `DATA`

, `TOTALS`

, `NOBSERVATIONS`

, `MEANS`

, `MINIMA`

, `MAXIMA`

, `VARIANCES`

, `QUANTILES`

, `SDS`

, `SKEWNESS`

, `KURTOSIS`

, `SEMEANS`

, `SESKEWNESS`

, `SEKURTOSIS`

.

### Action with `RESTRICT`

If any of the `DATA`

variates, or the `WEIGHTS`

variate, or any of the classifying factors of the tables is restricted, `TABULATE`

will form the tables using only the defined subset of units. If more than one variate or factor is restricted, the restrictions must be the same.

### See also

Directives: `TABLE`

, `MARGIN`

, `COMBINE`

.

Procedures: `FBETWEENGROUPVECTORS`

, `MTABULATE`

, `PERCENT`

, `SVSTRATIFIED`

, `SVTABULATE`

, `TABINSERT`

, `TABMODE`

, `TABSORT`

, `VSUMMARY`

.

Commands for: Basic and nonparametric statistics, Calculations and manipulation, Survey analysis.

### Example

" Examples 1:4.11.1a-b, 1:4.11.3a-b, 1:4.11.6a-b, 1:4.11.7 " VARIATE [NVALUES=15] Quantity,Charge FACTOR [NVALUES=15; LABELS=!T(A,B)] Type & [LABELS=!T(London,Manchester,Birmingham,Bristol)] Town READ [PRINT=data,errors] Town,Quantity,Type; FREPRESENTATION=labels London 10 A Manchester 5 B Birmingham 10 B Bristol 25 A Manchester 10 * Birmingham 100 B London 200 B Manchester 25 A Bristol 50 A Birmingham 25 A Bristol 25 B London 25 A London 50 B Manchester 25 B London 50 A : TABLE [CLASSIFICATION=Town,Type] Totdisp; UNKNOWN=Udisp TABULATE Quantity; TOTALS=Totdisp PRINT Totdisp; DECIMALS=0 READ [PRINT=data,errors] Charge 10 20 15 15 * 60 80 30 25 15 25 15 40 * * : TABULATE [CLASSIFICATION=Town; COUNTS=Nconsign] DATA=Charge; \ TOTALS=Payment; NOBSERVATIONS=Invoices PRINT Nconsign,Invoices,Payment; DECIMALS=0,0,2 PERCENT Totdisp; NEWTABLE=Totdisp% PRINT [PUNKNOWN=never] Totdisp%; DECIMALS=2 T%CONTROL Totdisp; FACTOR=Town; CONTROL='London' TABSORT [PRINT=tables; DIRECTION=descending] Invoices; DECIMALS=0 POINTER [VALUES=Town,Type] Classification POINTER [VALUES=SortedTowns,SortedTypes] Newclassification TABSORT [DIRECTION=descending; FACTORS=Classification;\ NEWFACTORS=Newclassification] Totdisp%; Sorted%Totals PRINT Sorted%Totals VARIATE [VALUES=1...4] Newx DTABLE Sorted%Totals; XFACTOR=SortedTowns; NEWXLEVELS=Newx;\ GROUPS=SortedTypes