Summarizes a variate, with classifying factors, into a data matrix of variates and factors (D.B. Baird).
Options
PRINT = string token |
What to print (summaries ); default * i.e. none |
CLASSIFICATION = factors |
Factors classifying the summary groups |
NEWCLASSIFICATION = factors |
Factors in the data matrix to classify the output variates |
REDEFINE = string token |
Whether to redefine the CLASSIFICATION factors and DATA variates, if NEWCLASSIFICATION or NEWDATA are not set (yes , no ); default no |
CMETHOD = string token |
How to form levels for carried factors (median , minimum , maximum ); default median |
MVINCLUDE = string token |
Whether to include factor combinations with no observations in summaries (yes , no ); default no |
WARNING = string token |
What warnings to output (carry ); default carry warns when carried factors have varying values within classification groups |
Parameters
DATA = variates, factors or pointers |
Data to be summarized |
STATISTIC = texts |
What statistic to calculate (carry , counts , sums , totals , nobservations , means , minima , maxima , variances , quantiles , sds , skewness , kurtosis , semeans , seskewness , sekurtosis ); default mean |
PERCENTILE = scalars or variates |
Percentile to be used for quantiles; default 50 |
NEWDATA = variates, factors or pointers |
Summary statistics as variates or factors for STATISTIC=carry |
Description
VSUMMARY
forms data matrices containing summary statistics rather than the usual tables created by TABULATE
. This can be useful if the summary statistics are to be used in a further analysis (e.g. an analysis of variance).
The CLASSIFICATION
option specifies the classifying factors for the summaries, and the DATA
parameter provides variates or factors to be summarized. The STATISTIC
parameter specifies the type of numerical summary: counts, totals, numbers of non-missing values, means, medians, minima, maxima, variances, quantiles, standard deviations, skewness and kurtosis coefficients and (within-cell) standard errors of means, skewness and kurtosis. The statistic sums
is a synonym of totals
. The statistic carry
, which only applies to factors, can be used to create summary factors with levels that occur in each group, e.g., in a field trial with repeated measurements in plots, we would like to carry across the factors that give the replicate and treatments for each plot. If the carried factors vary within the classification groups, a warning will be given if WARNING
=carry
, but this can be suppressed with WARNING
=*
. In the case of varying levels within groups, the CMETHOD
option controls how the levels for these groups are chosen, taking either the median
, minimum
or maximum
level present within the group for the summary level. When STATISTIC
=quantiles
, the PERCENTILE
parameter specifies the quantile to be calculated, as a percentage between 0 and 100.
The NEWDATA
parameter saves the summary statistics and the NEWCLASSIFICATION
option saves new factors that gives levels of the classifying factors for the summaries. These parameters do not need to set if you set REDEFINE=yes
. The DATA
and the CLASSIFICATION
structures are then redefined to be the summary statistics and factors respectively.
The PRINT
option allows you to print the summaries. By default, nothing is printed.
Options: PRINT
, CLASSIFICATION
, NEWCLASSIFICATION
, REDEFINE
, CMETHOD
, MVINCLUDE
, WARNING
.
Parameters: DATA
, STATISTIC
, PERCENTILE
, NEWDATA
.
Method
VSUMMARY
uses TABULATE
to form tables for each statistic, and then VTABLE
to extract the new summary factors and variates.
Action with RESTRICT
VSUMMARY
takes account of any restrictions on the classifying factors or the DATA
variates.
See also
Directives: TABULATE
.
Procedures: MTABULATE
, SVTABULATE
, VTABLE
.
Commands for: Basic and nonparametric statistics, Survey analysis.
Example
CAPTION 'VSUMMARY example','New Zealand income survey summaries'; \ STYLE=meta,plain SPLOAD [PRINT=*] '%Data%/New Zealand Income Survey.GSH' FOR "Group commands so print out is separate to echoed statements" VSUMMARY [PRINT=summaries; CLASS=Gender,Qualification; \ NEWCLASS=gender,qualification] Age,Hours,Income; \ STATISTIC=median; NEWDATA=age,hours,income VSUMMARY [CLASS=Gender,Qualification,Marital,Ethnicity; REDEFINE=yes] \ Age,Hours,Income; STATISTIC='mean' CAPTION 'Class means'; STYLE=minor PRINT Gender,Qualification,Marital,Ethnicity,Age,Hours,Income; \ FIELD=7,11,10,10,6,7,8; DECIMALS=0; JUST=4(left),3(right) ENDFOR