Forms summary tables of variate values.
Options
Parameters
DATA = variates |
Data values to be tabulated |
---|---|
TOTALS = tables |
Tables to contain totals |
NOBSERVATIONS = tables |
Tables containing the numbers of non-missing values in each cell |
MEANS = tables |
Tables of means |
MINIMA = tables |
Tables of minimum values in each cell |
MAXIMA = tables |
Tables of maximum values in each cell |
VARIANCES = tables |
Tables of cell variances |
QUANTILES = tables or pointers |
Table to contain quantiles at a single PERCENTQUANTILE or pointer of tables for several PERCENTQUANTILEs (not available for sequential or OWN tabulation) |
SDS = tables |
Tables of standard deviations |
SKEWNESS = tables |
Tables of skewness coefficients |
KURTOSIS = tables |
Tables of kurtosis coefficients |
SEMEANS = tables |
Tables of standard errors of means |
SESKEWNESS = tables |
Tables of standard errors of skewness coefficients |
SEKURTOSIS = tables |
Tables of standard errors of kurtosis coefficients |
Description
TABULATE
allows you to produce the various types of tabular summary listed in the settings of its PRINT
option. The variates whose values are to be summarized are listed with the DATA
parameter. If you want to save the summaries in tables, for manipulating or for printing later on, you should list identifiers of the tables in the appropriate parameter list: for example, you would save the totals in a table T
by including T
in the list for the TOTALS
parameter. The other parameters similarly give the other kinds of summary: numbers of non-missing values, means, minima, maxima, variances, quantiles, standard deviations, skewness, kurtosis and (within-cell) standard errors of means, skewness or kurtosis.
If you specify less tables in the lists than the number of DATA
variates, Genstat produces accumulated summaries. For example, with
TABULATE Sales2001,Costs2001,Sales2002,Costs2002;\
TOTALS=Totalsales,Totalcosts
the TOTALS
list is recycled. So Totalsales
will correspond to Sales2001
and Sales2002
, and accumulate the totals from both variates. Similarly Totalcost
will contain the totals from the variates Costs2001 and Costs2002. To avoid confusion, however, you are not allowed to specify table lists with differing lengths.
The simplest quantile, and the one produced by default, is the median (50% quantile), but the PERCENTQUANTILE
option allows you to request any percentage point (between 0 and 100, of course). Moreover, by specifying a variate as the setting for PERCENTQUANTILE
, you can obtain several quantiles at the same time. However, if you then want to save the results the setting of the QUANTILE
parameter must be a pointer with length equal to the required number of quantiles, instead of a single table.
If you merely want to print the summaries, you do not usually need to list any tables; you need only specify the PRINT
option. The only exception to this is with sequential tabulation, described at the end of this subsection.
The CLASSIFICATION
option defines the classifying factors for the tables. This need not be set if at least one of the tables has already been declared (but then all the declared tables must have the same classifying factors). The MARGINS
option determines whether or not the tables will have margins, if none have already been declared (and those that have been declared must be either all with margins or all without margins).
In the tables that correspond to the parameters of TABULATE
, missing values of the data variates are ignored. So the NOBSERVATIONS
parameter and the nobservations
setting of the PRINT
option provide the numbers of non-missing units of the data variates for each factor combination. You can however obtain a count of the numbers of units that would have contributed to each group if no values had been missing: you use the COUNTS
option if you want to save the table, or put PRINT=counts
if you want to print it. If any of the factor values are missing Genstat ascribes the corresponding units to the unknown cell associated with the table (see the TABLE
directive).
If there are no observations in one of the groups, the corresponding cell will be zero in a table of numbers of observations or counts; in a table of totals, means, minima, maxima, variances, standard deviations, skewness, kurtosis or standard errors of means, skewness or kurtosis the cell will contain a missing value.
Weighted tables can be obtained by setting the WEIGHT
option to a variate of weights. You can, in general, think of weights as a set of multipliers which are applied to the data before any operations are performed. Thus, for most aspects of weighted tabulation you can replace x by wx and 1 by w (that is, n by Σw) in the standard formulae; see the table below. This is not what happens in the case of variances, standard deviations (which are square roots of the variances) and quantiles, but it is true for the other functions (including counts).
Unweighted | Weighted | |
Count | n | Σ w |
Total | Σ x | Σ wx |
Nobservations | n | Σ w (x not missing) |
Mean | Σ x/n | Σ wx / Σ w |
Minimum | Min( x ) | Min( wx ) |
Maximum | Max( x ) | Max( wx ) |
Variance | Σ(x – (Σx/n))2 / n-1 | Σw(x – (Σwx/Σw))2 / Σ w-1 |
Skewness |
Σ(x – (Σx/n))3 / ( Σ(x – (Σx/n))2 )3/2 |
Σ w (x – (Σwx/Σw))3 / ( Σ w (x – (Σwx/Σw))2 )3/2 |
Kurtosis |
Σ(x – (Σx/n))4 / ( Σ(x – (Σx/n))2 )2 – 3 |
Σ w (x – (Σwx/Σw))4 / Σ w (x – (Σwx/Σw))2 )2 – 3 |
s.e. skewness |
√( { 6n × (n-1) } / { (n-2) × (n+1) × (n+3) } ) |
√( { 6Σw × (Σw – 1) } / { (Σw – 2) × (Σw + 1) × (Σw + 3) } ) (x not missing) |
s.e. kurtosis |
√( { 24 × n × (n-1)2 } / { (n-2) × (n-3) × (n+5) × (n+3) } ) |
√( { 24 × Σw × (Σw – 1)2 } / { (Σw – 2) × (Σw – 3) × (Σw + 5) × (Σw + 3) } ) (x not missing) |
A quick look at the formula used for the weighted variance (or the standard deviation) or skewness or kurtosis shows that it breaks down for Σw<1; in fact it is valid only when the weights are integer values greater than or equal to zero. Similarly, with quantiles the weights are assumed to specify replicated observations; so these must also be non-negative integers. If an invalid weight is found during the calculation of a variance, skewness, kurtosis or quantile a fault will be reported. Temporary tables will be deleted, but named tables may contain partial results. However, non-integer weights are allowed in other contexts. The standard deviation is the square root of the variance, and the standard error of the mean is the standard deviation divided by the square root of the number of observations.
If you have many observations to summarize, there may be insufficient space within Genstat for you to read them all and then form the tables. To cater for such situations, Genstat allows you to process the data in sections, using the SEQUENTIAL
option of TABULATE
in conjunction with the SEQUENTIAL
option of READ
. After READ
, the absolute value of the option indicates the number of units that have been read in this particular section; the value is positive during interim sections and negative or zero once the terminator at the end of the data is reached. TABULATE
will not print any tables until the final section has been processed. If you want to see the intermediate tables, you can include a PRINT
statement after the TABULATE
statement. To allow Genstat to keep contact with the working tables in which the results are accumulating, you must save at least one out of the various types of table for every DATA
variate. Genstat can then link the working tables to this named table during the course of the sequential tabulation, so that the information is not lost between the successive uses of TABULATE
.
The final five options of TABULATE
(OWN
, OWNFACTORS
, OWNVARIATES
, INCHANNEL
and INFILETYPE
) allow you to link your own Fortran subroutine, G5XZIT
, to Genstat to allow you to handle complicated arrangements of data, as can occur for example in hierarchical surveys. To implement this, you must get access to some of the Genstat source code. The relevant section of the code is named Module X, and is distributed with Genstat to all sites, probably in a file called X.FOR. The documentation of G5XZIT is included with the Fortran and so is not repeated here. G5XZIT is thus a Fortran subprogram, to be modified by you, which is called from within TABULATE
for each unit to be tabulated. It contains switches to tell TABULATE
when a data error occurs or when all the data have been read. To use it you have to link your own version of Genstat, as when using the OWN
directive. Then your version of G5XZIT will be used instead of the standard version supplied as part of Genstat.
The subprogram can be as simple or as complicated as you like (or need), provided it obeys a few simple rules. A very simple version, reading two variates and two factors, is supplied with Genstat. This should provide sufficient information for you to write your own version, and link it into your own private version of Genstat.
The OWN
option should be set to a variate allowing you to communicate between your Genstat code and your G5XZIT subprogram. The OWNFACTORS
option provides the list of factors to be read by G5XZIT. It must include the classifying factors needed in the current TABULATE
instruction, but it may contain others as well. The OWNVARIATES
option should provide a similar list of variates. The INCHANNEL
option should be set to the Genstat channel number of the data file, as specified in a previous OPEN
statement or in the Genstat command line. The INFILETYPE
option specifies whether the data file is character (input) or binary (unformatted).
TABULATE
allows only one classification set to be used at a time. If the data set is complicated enough to require G5XZIT, then several tabulations with different classifying sets are likely to be needed. Rather than have a separate branch in G5XZIT for each tabulation, you can put all the factors and all the variates that you will need into the settings of the OWNFACTORS
and OWNVARIATES
options, and leave TABULATE
to extract the ones it needs each time. If you have several TABULATE
statements as suggested, you will have to close the data file and re-open it between them.
Options: PRINT
, CLASSIFICATION
, COUNTS
, SEQUENTIAL
, MARGINS
, IPRINT
, WEIGHTS
, PERCENTQUANTILES
, OWN
, OWNFACTORS
, OWNVARIATES
, INCHANNEL
, INFILETYPE
.
Parameters: DATA
, TOTALS
, NOBSERVATIONS
, MEANS
, MINIMA
, MAXIMA
, VARIANCES
, QUANTILES
, SDS
, SKEWNESS
, KURTOSIS
, SEMEANS
, SESKEWNESS
, SEKURTOSIS
.
Action with RESTRICT
If any of the DATA
variates, or the WEIGHTS
variate, or any of the classifying factors of the tables is restricted, TABULATE
will form the tables using only the defined subset of units. If more than one variate or factor is restricted, the restrictions must be the same.
See also
Directives: TABLE
, MARGIN
, COMBINE
.
Procedures: FBETWEENGROUPVECTORS
, MTABULATE
, PERCENT
, SVSTRATIFIED
, SVTABULATE
, TABINSERT
, TABMODE
, TABSORT
, VSUMMARY
.
Commands for: Basic and nonparametric statistics, Calculations and manipulation, Survey analysis.
Example
" Examples 1:4.11.1a-b, 1:4.11.3a-b, 1:4.11.6a-b, 1:4.11.7 " VARIATE [NVALUES=15] Quantity,Charge FACTOR [NVALUES=15; LABELS=!T(A,B)] Type & [LABELS=!T(London,Manchester,Birmingham,Bristol)] Town READ [PRINT=data,errors] Town,Quantity,Type; FREPRESENTATION=labels London 10 A Manchester 5 B Birmingham 10 B Bristol 25 A Manchester 10 * Birmingham 100 B London 200 B Manchester 25 A Bristol 50 A Birmingham 25 A Bristol 25 B London 25 A London 50 B Manchester 25 B London 50 A : TABLE [CLASSIFICATION=Town,Type] Totdisp; UNKNOWN=Udisp TABULATE Quantity; TOTALS=Totdisp PRINT Totdisp; DECIMALS=0 READ [PRINT=data,errors] Charge 10 20 15 15 * 60 80 30 25 15 25 15 40 * * : TABULATE [CLASSIFICATION=Town; COUNTS=Nconsign] DATA=Charge; \ TOTALS=Payment; NOBSERVATIONS=Invoices PRINT Nconsign,Invoices,Payment; DECIMALS=0,0,2 PERCENT Totdisp; NEWTABLE=Totdisp% PRINT [PUNKNOWN=never] Totdisp%; DECIMALS=2 T%CONTROL Totdisp; FACTOR=Town; CONTROL='London' TABSORT [PRINT=tables; DIRECTION=descending] Invoices; DECIMALS=0 POINTER [VALUES=Town,Type] Classification POINTER [VALUES=SortedTowns,SortedTypes] Newclassification TABSORT [DIRECTION=descending; FACTORS=Classification;\ NEWFACTORS=Newclassification] Totdisp%; Sorted%Totals PRINT Sorted%Totals VARIATE [VALUES=1...4] Newx DTABLE Sorted%Totals; XFACTOR=SortedTowns; NEWXLEVELS=Newx;\ GROUPS=SortedTypes