Reshapes a data set with classifying factors for rows and columns, into a reorganized data set with new identifying factors (D.B. Baird).
Options
PRINT = string token |
What to print (results ); default *, i.e. none |
ROWCLASSIFICATION = factors, texts, variates or pointer |
Factors classifying the rows in the data; default a factor called Rows with a level for each row |
COLCLASSIFICATION = factors, texts, variates or pointer |
Factors or texts classifying the columns in the data; default a factor called Columns with labels formed from the column identifiers in DATA |
MEANFACTORS = factors, texts, variates or pointer |
Row or column factors whose groups are averaged in the output data set |
TOTALFACTORS = factors, texts, variates or pointer |
Row or column factors whose groups are totalled in the output data set |
FIRSTSUMMARY = string token |
Which summaries to form first (means , totals ) default means |
NEWROWFACTORS = factors |
Factors to index the new rows |
NEWCOLUMNFACTORS = factors, texts or variates |
Factors to indexing the columns in the new data set |
REDEFINE = string token |
Whether to redefine the NEWROWFACTORS factors and DATA columns, if NEWROWFACTORS or NEWDATA are not set or use names used in the input data (yes , no ); default no |
MVINCLUDE = string token |
Whether to include factor combinations with no observations in the output data set (* ,rows , columns ); default * ; i.e. remove missing rows and columns |
Parameters
DATA = pointers |
Pointer containing data to be reshaped |
NEWDATA = pointers |
Pointer containing the reshaped data columns |
Description
RESHAPE
reshapes data matrices. This is useful when rows and columns in a data set are to be swapped to another dimension (rows to columns or vice-versa). It combines the functionality of STACK
and UNSTACK
in a single procedure. The data columns are stacked into a single columns, with factors indexing the resulting rows created from the original row and column factors. It is then unstacked by one or more of these factors to reshape the data. RESHAPE
goes beyond STACK
, in that more than one column factor can be defined. For example, if you have variates containing different measurements taken at several different times, one column factor could index the measurements and another the times. Also, the data can be collapsed across some of the factors by taking totals or means.
The ROWCLASSIFICATION
and COLCLASSIFICATION
options classify the rows and columns, respectively, and together these provide the input factors. If texts or variates are specified with these options, they are converted internally to factors. If any of these options is not specified, a default factor (Rows
or Columns
) is created, and this can be used in the NEWCOLUMNFACTORS
, TOTALFACTORS
or MEANFACTORS
options, described below. However, the Columns
factor is formed only if there is more than one vector in the pointer specified by the DATA
parameter. This pointer specifies the variates, text or factors to be reorganized. The vectors in DATA
must be compatible with the first vector, and this determines the type of the resulting column. Any type of vector can be combined with a factor or a text, but texts cannot be combined with a variate. If the COLCLASSIFICATION
factors have no values, a warning is given, and their values are formed using GENERATE
. A fault is given if the product of the numbers of levels is not equal to the number of vectors in DATA
.
The TOTALFACTORS
and MEANFACTORS
options specify factors over which totals or means, respectively, are to be formed. The FIRSTSUMMARY
option controls whether the means
or the totals
are formed first.
This will make a difference only when there are missing values in the data matrix. Suppose, for example, we have a 3 × 2 classification of (3,5,6,3,7,*) where the final cell is missing. Totalling (3,5,6) and (3,7,*) and then averaging would give (14 + 10)/2 = 12. However, averaging (3,3), (5,7) and (6,*) and then totalling would give (3+6+6) = 15. If the missing value is replaced by 8, both orders of operation would give the same result, as either (14+18)/2 or (3+6+7) i.e. 16.
The NEWDATA
parameter saves a pointer containing the reshaped data columns. If the NEWDATA
pointer is undefined, it will be created with labels formed from the input factors labels (if present) or levels. The NEWCOLUMNFACTORS
option lists factors to index the columns in the new data pointer. A column in the output data is created for each combination of these factors (but some of these columns may be dropped according to the setting of the MVINCLUDE
option as explained below). The NEWROWFACTORS
option specifies new row factors, which are formed from the input factors that have not been used in the NEWCOLUMNFACTORS
, MEANS
or TOTALS
options. The factors specified by NEWROWFACTORS
are allocated to the original factors in the order in which they were specified, first by the ROWCLASSIFICATION
option, and then by the COLCLASSIFICATION
option. These input factors can be reused if you set option REDEFINE = yes
.
The MVINCLUDE
option controls whether empty rows or columns are included in the new data set. The default setting, *
, removes both empty rows and columns, rows
includes empty rows, columns
includes empty columns, and rows,columns
includes both empty rows and columns.
You can set option PRINT=results
to print the new data set. By default, nothing is printed.
Options: PRINT
, ROWCLASSIFICATION
, COLCLASSIFICATION
, TOTALFACTORS
, MEANFACTORS
, FIRSTSUMMARY
, NEWROWFACTORS
, NEWCOLUMNFACTORS
, REDEFINE
, MVINCLUDE
.
Parameters: DATA
, NEWDATA
.
Method
RESHAPE
uses APPEND
to form the data set into a single column indexed by all the input factors. TABULATE
is used to form means and totals, and then VTABLE
is used to extract the data from the summary tables. UNSTACK
is used to extract the reshaped columns. SUBSET
is used to remove missing rows.
Action with RESTRICT
RESHAPE
ignores any restrictions on the input factors or the DATA
structures.
See also
Directives: EQUATE
, TABULATE
.
Procedures: APPEND
, STACK
, SUBSET
, UNSTACK
, VSUMMARY
, VTABLE
.
Commands for: Calculations and manipulation.
Example
CAPTION 'RESHAPE Example',\ !T('DDT pesticide soil samples from 1970-1993 on long term irrigation', \ 'trial with 4 replicates. DDE and DDD are breakdown products of DDT.',\ 'Measurement units are parts per million (ppm).'); \ STYLE=meta,plain "Read in data with columns giving plots x chemicals and rows years x depth" IMPORT [PRINT=summary] '%Examples%/DDT Soil Samples 1970-93.xlsx'; \ CELL='A4:Z76'; ISAVE=pData "Display input data in a spreadsheet" FSPREADSHEET pData "Set up factors indexing the data columns D1T to I4D where D = dryland, I = irrgated, 1-4 are replicates and T,E and D are chemicals DDT, DDE and DDD" FACTOR [LABELS=!T(Dryland,Irrigated);VALUES=12(1,2)] Irrig FACTOR [LEVELS=4; VALUES=3(1...4)2] Rep FACTOR [LABELS=!T(DDT,DDE,DDD); VALUES=(1...3)8] Chemical " Get the data columns as columns 1 & 2 are the row factors" POINTER [VALUES=pData[3...26]] data "Put the chemicals into columns, all other factors stacked" RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \ NEWCOLUMN=Chemical; NEWROW=Years,SoilDepth,Irrigation,Reps] \ data; NEWDATA=!P(DDT,DDE,DDD) "Put the chemicals into columns, forming means over depths, including missing plots" RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \ MEAN=Depth; NEWCOLUMN=Chemical; NEWROW=Years,Irrigation,Reps; \ MVINCLUDE=rows] data; NEWDATA=!P(DDT,DDE,DDD) "Put the depths into columns, forming totals over the chemicals" RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \ TOTAL=Chemical; NEWCOLUMN=Depth; NEWROW=Years,Irrigation,Reps] \ data; NEWDATA=D