1. Home
  2. VSUMMARY procedure

VSUMMARY procedure

Summarizes a variate, with classifying factors, into a data matrix of variates and factors (D.B. Baird).

Options

PRINT = string token What to print (summaries); default * i.e. none
CLASSIFICATION = factors Factors classifying the summary groups
NEWCLASSIFICATION = factors Factors in the data matrix to classify the output variates
REDEFINE = string token Whether to redefine the CLASSIFICATION factors and DATA variates, if NEWCLASSIFICATION or NEWDATA are not set (yes, no); default no
CMETHOD = string token How to form levels for carried factors (median, minimum, maximum); default median
MVINCLUDE = string token Whether to include factor combinations with no observations in summaries (yes, no); default no
WARNING = string token What warnings to output (carry); default carry warns when carried factors have varying values within classification groups

Parameters

DATA = variates, factors or pointers Data to be summarized
STATISTIC = texts What statistic to calculate (carry, counts, sums, totals, nobservations, means, minima, maxima, variances, quantiles, sds, skewness, kurtosis, semeans, seskewness, sekurtosis); default mean
PERCENTILE = scalars or variates Percentile to be used for quantiles; default 50
NEWDATA = variates, factors or pointers Summary statistics as variates or factors for STATISTIC=carry

Description

VSUMMARY forms data matrices containing summary statistics rather than the usual tables created by TABULATE. This can be useful if the summary statistics are to be used in a further analysis (e.g. an analysis of variance).

The CLASSIFICATION option specifies the classifying factors for the summaries, and the DATA parameter provides variates or factors to be summarized. The STATISTIC parameter specifies the type of numerical summary: counts, totals, numbers of non-missing values, means, medians, minima, maxima, variances, quantiles, standard deviations, skewness and kurtosis coefficients and (within-cell) standard errors of means, skewness and kurtosis. The statistic sums is a synonym of totals. The statistic carry, which only applies to factors, can be used to create summary factors with levels that occur in each group, e.g., in a field trial with repeated measurements in plots, we would like to carry across the factors that give the replicate and treatments for each plot. If the carried factors vary within the classification groups, a warning will be given if WARNING=carry, but this can be suppressed with WARNING=*. In the case of varying levels within groups, the CMETHOD option controls how the levels for these groups are chosen, taking either the median, minimum or maximum level present within the group for the summary level. When STATISTIC=quantiles, the PERCENTILE parameter specifies the quantile to be calculated, as a percentage between 0 and 100.

The NEWDATA parameter saves the summary statistics and the NEWCLASSIFICATION option saves new factors that gives levels of the classifying factors for the summaries. These parameters do not need to set if you set REDEFINE=yes. The DATA and the CLASSIFICATION structures are then redefined to be the summary statistics and factors respectively.

The PRINT option allows you to print the summaries. By default, nothing is printed.

Options: PRINT, CLASSIFICATION, NEWCLASSIFICATION, REDEFINE, CMETHOD, MVINCLUDE, WARNING.
Parameters: DATA, STATISTIC, PERCENTILE, NEWDATA.

Method

VSUMMARY uses TABULATE to form tables for each statistic, and then VTABLE to extract the new summary factors and variates.

Action with RESTRICT

VSUMMARY takes account of any restrictions on the classifying factors or the DATA variates.

See also

Directives: TABULATE.
Procedures: MTABULATE, SVTABULATE, VTABLE.
Commands for: Basic and nonparametric statistics, Survey analysis.

Example

CAPTION    'VSUMMARY example','New Zealand income survey summaries'; \
           STYLE=meta,plain

SPLOAD     [PRINT=*] '%Data%/New Zealand Income Survey.GSH'

FOR "Group commands so print out is separate to echoed statements"
  VSUMMARY [PRINT=summaries; CLASS=Gender,Qualification; \
           NEWCLASS=gender,qualification] Age,Hours,Income; \
           STATISTIC=median; NEWDATA=age,hours,income

  VSUMMARY [CLASS=Gender,Qualification,Marital,Ethnicity; REDEFINE=yes] \
           Age,Hours,Income; STATISTIC='mean'

  CAPTION 'Class means'; STYLE=minor
  PRINT    Gender,Qualification,Marital,Ethnicity,Age,Hours,Income; \
           FIELD=7,11,10,10,6,7,8; DECIMALS=0; JUST=4(left),3(right)
ENDFOR
Updated on February 10, 2022

Was this article helpful?