Summarizes a variate, with classifying factors, into a data matrix of variates and factors (D.B. Baird).
Options
PRINT = string token |
What to print (summaries); default * i.e. none |
CLASSIFICATION = factors |
Factors classifying the summary groups |
NEWCLASSIFICATION = factors |
Factors in the data matrix to classify the output variates |
REDEFINE = string token |
Whether to redefine the CLASSIFICATION factors and DATA variates, if NEWCLASSIFICATION or NEWDATA are not set (yes, no); default no |
CMETHOD = string token |
How to form levels for carried factors (median, minimum, maximum); default median |
MVINCLUDE = string token |
Whether to include factor combinations with no observations in summaries (yes, no); default no |
WARNING = string token |
What warnings to output (carry); default carry warns when carried factors have varying values within classification groups |
Parameters
DATA = variates, factors or pointers |
Data to be summarized |
STATISTIC = texts |
What statistic to calculate (carry, counts, sums, totals, nobservations, means, minima, maxima, variances, quantiles, sds, skewness, kurtosis, semeans, seskewness, sekurtosis); default mean |
PERCENTILE = scalars or variates |
Percentile to be used for quantiles; default 50 |
NEWDATA = variates, factors or pointers |
Summary statistics as variates or factors for STATISTIC=carry |
Description
VSUMMARY forms data matrices containing summary statistics rather than the usual tables created by TABULATE. This can be useful if the summary statistics are to be used in a further analysis (e.g. an analysis of variance).
The CLASSIFICATION option specifies the classifying factors for the summaries, and the DATA parameter provides variates or factors to be summarized. The STATISTIC parameter specifies the type of numerical summary: counts, totals, numbers of non-missing values, means, medians, minima, maxima, variances, quantiles, standard deviations, skewness and kurtosis coefficients and (within-cell) standard errors of means, skewness and kurtosis. The statistic sums is a synonym of totals. The statistic carry, which only applies to factors, can be used to create summary factors with levels that occur in each group, e.g., in a field trial with repeated measurements in plots, we would like to carry across the factors that give the replicate and treatments for each plot. If the carried factors vary within the classification groups, a warning will be given if WARNING=carry, but this can be suppressed with WARNING=*. In the case of varying levels within groups, the CMETHOD option controls how the levels for these groups are chosen, taking either the median, minimum or maximum level present within the group for the summary level. When STATISTIC=quantiles, the PERCENTILE parameter specifies the quantile to be calculated, as a percentage between 0 and 100.
The NEWDATA parameter saves the summary statistics and the NEWCLASSIFICATION option saves new factors that gives levels of the classifying factors for the summaries. These parameters do not need to set if you set REDEFINE=yes. The DATA and the CLASSIFICATION structures are then redefined to be the summary statistics and factors respectively.
The PRINT option allows you to print the summaries. By default, nothing is printed.
Options: PRINT, CLASSIFICATION, NEWCLASSIFICATION, REDEFINE, CMETHOD, MVINCLUDE, WARNING.
Parameters: DATA, STATISTIC, PERCENTILE, NEWDATA.
Method
VSUMMARY uses TABULATE to form tables for each statistic, and then VTABLE to extract the new summary factors and variates.
Action with RESTRICT
VSUMMARY takes account of any restrictions on the classifying factors or the DATA variates.
See also
Directives: TABULATE.
Procedures: MTABULATE, SVTABULATE, VTABLE.
Commands for: Basic and nonparametric statistics, Survey analysis.
Example
CAPTION 'VSUMMARY example','New Zealand income survey summaries'; \
STYLE=meta,plain
SPLOAD [PRINT=*] '%Data%/New Zealand Income Survey.GSH'
FOR "Group commands so print out is separate to echoed statements"
VSUMMARY [PRINT=summaries; CLASS=Gender,Qualification; \
NEWCLASS=gender,qualification] Age,Hours,Income; \
STATISTIC=median; NEWDATA=age,hours,income
VSUMMARY [CLASS=Gender,Qualification,Marital,Ethnicity; REDEFINE=yes] \
Age,Hours,Income; STATISTIC='mean'
CAPTION 'Class means'; STYLE=minor
PRINT Gender,Qualification,Marital,Ethnicity,Age,Hours,Income; \
FIELD=7,11,10,10,6,7,8; DECIMALS=0; JUST=4(left),3(right)
ENDFOR