1. Home
2. VSUMMARY procedure

# VSUMMARY procedure

Summarizes a variate, with classifying factors, into a data matrix of variates and factors (D.B. Baird).

### Options

 `PRINT` = string token What to print (`summaries`); default `*` i.e. none `CLASSIFICATION` = factors Factors classifying the summary groups `NEWCLASSIFICATION` = factors Factors in the data matrix to classify the output variates `REDEFINE` = string token Whether to redefine the `CLASSIFICATION` factors and `DATA` variates, if `NEWCLASSIFICATION` or `NEWDATA` are not set (`yes`, `no`); default `no` `CMETHOD` = string token How to form levels for carried factors (`median`, `minimum`, `maximum`); default `median` `MVINCLUDE` = string token Whether to include factor combinations with no observations in summaries (`yes`, `no`); default `no` `WARNING` = string token What warnings to output (`carry`); default `carry` warns when carried factors have varying values within classification groups

### Parameters

 `DATA` = variates, factors or pointers Data to be summarized `STATISTIC` = texts What statistic to calculate (`carry`, `counts`, `sums`, `totals`, `nobservations`, `means`, `minima`, `maxima`, `variances`, `quantiles`, `sds`, `skewness`, `kurtosis`, `semeans`, `seskewness`, `sekurtosis`); default `mean` `PERCENTILE` = scalars or variates Percentile to be used for quantiles; default 50 `NEWDATA` = variates, factors or pointers Summary statistics as variates or factors for `STATISTIC=carry`

### Description

`VSUMMARY` forms data matrices containing summary statistics rather than the usual tables created by `TABULATE`. This can be useful if the summary statistics are to be used in a further analysis (e.g. an analysis of variance).

The `CLASSIFICATION` option specifies the classifying factors for the summaries, and the `DATA` parameter provides variates or factors to be summarized. The `STATISTIC` parameter specifies the type of numerical summary: counts, totals, numbers of non-missing values, means, medians, minima, maxima, variances, quantiles, standard deviations, skewness and kurtosis coefficients and (within-cell) standard errors of means, skewness and kurtosis. The statistic `sums` is a synonym of `totals`. The statistic `carry`, which only applies to factors, can be used to create summary factors with levels that occur in each group, e.g., in a field trial with repeated measurements in plots, we would like to carry across the factors that give the replicate and treatments for each plot. If the carried factors vary within the classification groups, a warning will be given if `WARNING`=`carry`, but this can be suppressed with `WARNING`=`*`. In the case of varying levels within groups, the `CMETHOD` option controls how the levels for these groups are chosen, taking either the `median`, `minimum` or `maximum` level present within the group for the summary level. When `STATISTIC`=`quantiles`, the `PERCENTILE` parameter specifies the quantile to be calculated, as a percentage between 0 and 100.

The `NEWDATA` parameter saves the summary statistics and the `NEWCLASSIFICATION` option saves new factors that gives levels of the classifying factors for the summaries. These parameters do not need to set if you set `REDEFINE=yes`. The `DATA` and the `CLASSIFICATION` structures are then redefined to be the summary statistics and factors respectively.

The `PRINT` option allows you to print the summaries. By default, nothing is printed.

Options: `PRINT`, `CLASSIFICATION`, `NEWCLASSIFICATION`, `REDEFINE`, `CMETHOD`, `MVINCLUDE`, `WARNING`.
Parameters: `DATA`, `STATISTIC`, `PERCENTILE`, `NEWDATA`.

### Method

`VSUMMARY` uses `TABULATE` to form tables for each statistic, and then `VTABLE` to extract the new summary factors and variates.

### Action with `RESTRICT`

`VSUMMARY` takes account of any restrictions on the classifying factors or the `DATA` variates.

Directives: `TABULATE`.
Procedures: `MTABULATE`, `SVTABULATE`, `VTABLE`.
Commands for: Basic and nonparametric statistics, Survey analysis.

### Example

```CAPTION    'VSUMMARY example','New Zealand income survey summaries'; \
STYLE=meta,plain

SPLOAD     [PRINT=*] '%Data%/New Zealand Income Survey.GSH'

FOR "Group commands so print out is separate to echoed statements"
VSUMMARY [PRINT=summaries; CLASS=Gender,Qualification; \
NEWCLASS=gender,qualification] Age,Hours,Income; \
STATISTIC=median; NEWDATA=age,hours,income

VSUMMARY [CLASS=Gender,Qualification,Marital,Ethnicity; REDEFINE=yes] \
Age,Hours,Income; STATISTIC='mean'

CAPTION 'Class means'; STYLE=minor
PRINT    Gender,Qualification,Marital,Ethnicity,Age,Hours,Income; \
FIELD=7,11,10,10,6,7,8; DECIMALS=0; JUST=4(left),3(right)
ENDFOR```
Updated on February 10, 2022