RESHAPE procedure

Reshapes a data set with classifying factors for rows and columns, into a reorganized data set with new identifying factors (D.B. Baird).

Options

`PRINT` = string token	What to print (`results`); default *, i.e. none
`ROWCLASSIFICATION` = factors, texts, variates or pointer	Factors classifying the rows in the data; default a factor called Rows with a level for each row
`COLCLASSIFICATION` = factors, texts, variates or pointer	Factors or texts classifying the columns in the data; default a factor called Columns with labels formed from the column identifiers in `DATA`
`MEANFACTORS` = factors, texts, variates or pointer	Row or column factors whose groups are averaged in the output data set
`TOTALFACTORS` = factors, texts, variates or pointer	Row or column factors whose groups are totalled in the output data set
`FIRSTSUMMARY` = string token	Which summaries to form first (`means`, `totals`) default `means`
`NEWROWFACTORS` = factors	Factors to index the new rows
`NEWCOLUMNFACTORS` = factors, texts or variates	Factors to indexing the columns in the new data set
`REDEFINE` = string token	Whether to redefine the `NEWROWFACTORS` factors and `DATA` columns, if `NEWROWFACTORS` or `NEWDATA` are not set or use names used in the input data (`yes`, `no`); default `no`
`MVINCLUDE` = string token	Whether to include factor combinations with no observations in the output data set (``,`rows`, `columns`); default ``; i.e. remove missing rows and columns

Parameters

`DATA` = pointers	Pointer containing data to be reshaped
`NEWDATA` = pointers	Pointer containing the reshaped data columns

Description

RESHAPE reshapes data matrices. This is useful when rows and columns in a data set are to be swapped to another dimension (rows to columns or vice-versa). It combines the functionality of STACK and UNSTACK in a single procedure. The data columns are stacked into a single columns, with factors indexing the resulting rows created from the original row and column factors. It is then unstacked by one or more of these factors to reshape the data. RESHAPE goes beyond STACK, in that more than one column factor can be defined. For example, if you have variates containing different measurements taken at several different times, one column factor could index the measurements and another the times. Also, the data can be collapsed across some of the factors by taking totals or means.

The ROWCLASSIFICATION and COLCLASSIFICATION options classify the rows and columns, respectively, and together these provide the input factors. If texts or variates are specified with these options, they are converted internally to factors. If any of these options is not specified, a default factor (Rows or Columns) is created, and this can be used in the NEWCOLUMNFACTORS, TOTALFACTORS or MEANFACTORS options, described below. However, the Columns factor is formed only if there is more than one vector in the pointer specified by the DATA parameter. This pointer specifies the variates, text or factors to be reorganized. The vectors in DATA must be compatible with the first vector, and this determines the type of the resulting column. Any type of vector can be combined with a factor or a text, but texts cannot be combined with a variate. If the COLCLASSIFICATION factors have no values, a warning is given, and their values are formed using GENERATE. A fault is given if the product of the numbers of levels is not equal to the number of vectors in DATA.

The TOTALFACTORS and MEANFACTORS options specify factors over which totals or means, respectively, are to be formed. The FIRSTSUMMARY option controls whether the means or the totals are formed first.

This will make a difference only when there are missing values in the data matrix. Suppose, for example, we have a 3 × 2 classification of (3,5,6,3,7,*) where the final cell is missing. Totalling (3,5,6) and (3,7,*) and then averaging would give (14 + 10)/2 = 12. However, averaging (3,3), (5,7) and (6,*) and then totalling would give (3+6+6) = 15. If the missing value is replaced by 8, both orders of operation would give the same result, as either (14+18)/2 or (3+6+7) i.e. 16.

The NEWDATA parameter saves a pointer containing the reshaped data columns. If the NEWDATA pointer is undefined, it will be created with labels formed from the input factors labels (if present) or levels. The NEWCOLUMNFACTORS option lists factors to index the columns in the new data pointer. A column in the output data is created for each combination of these factors (but some of these columns may be dropped according to the setting of the MVINCLUDE option as explained below). The NEWROWFACTORS option specifies new row factors, which are formed from the input factors that have not been used in the NEWCOLUMNFACTORS, MEANS or TOTALS options. The factors specified by NEWROWFACTORS are allocated to the original factors in the order in which they were specified, first by the ROWCLASSIFICATION option, and then by the COLCLASSIFICATION option. These input factors can be reused if you set option REDEFINE = yes.

The MVINCLUDE option controls whether empty rows or columns are included in the new data set. The default setting, *, removes both empty rows and columns, rows includes empty rows, columns includes empty columns, and rows,columns includes both empty rows and columns.

You can set option PRINT=results to print the new data set. By default, nothing is printed.

Options: PRINT, ROWCLASSIFICATION, COLCLASSIFICATION, TOTALFACTORS, MEANFACTORS, FIRSTSUMMARY, NEWROWFACTORS, NEWCOLUMNFACTORS, REDEFINE, MVINCLUDE.
Parameters: DATA, NEWDATA.

Method

RESHAPE uses APPEND to form the data set into a single column indexed by all the input factors. TABULATE is used to form means and totals, and then VTABLE is used to extract the data from the summary tables. UNSTACK is used to extract the reshaped columns. SUBSET is used to remove missing rows.

Action with `RESTRICT`

RESHAPE ignores any restrictions on the input factors or the DATA structures.

Example

CAPTION 'RESHAPE Example',\
   !T('DDT pesticide soil samples from 1970-1993 on long term irrigation', \
   'trial with 4 replicates. DDE and DDD are breakdown products of DDT.',\
   'Measurement units are parts per million (ppm).');  \
   STYLE=meta,plain
   
"Read in data with columns giving plots x chemicals and rows years x depth"   
IMPORT [PRINT=summary] '%Examples%/DDT Soil Samples 1970-93.xlsx'; \
   CELL='A4:Z76'; ISAVE=pData

"Display input data in a spreadsheet"
FSPREADSHEET pData

"Set up factors indexing the data columns D1T to I4D where 
 D = dryland, I = irrgated, 1-4 are replicates and 
 T,E and D are chemicals DDT, DDE and DDD"
FACTOR [LABELS=!T(Dryland,Irrigated);VALUES=12(1,2)] Irrig
FACTOR [LEVELS=4; VALUES=3(1...4)2] Rep
FACTOR [LABELS=!T(DDT,DDE,DDD); VALUES=(1...3)8] Chemical
" Get the data columns as columns 1 & 2 are the row factors"
POINTER [VALUES=pData[3...26]] data

"Put the chemicals into columns, all other factors stacked"
RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \
   NEWCOLUMN=Chemical; NEWROW=Years,SoilDepth,Irrigation,Reps] \
   data; NEWDATA=!P(DDT,DDE,DDD)
   
"Put the chemicals into columns, forming means over depths, 
 including missing plots"
RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \
   MEAN=Depth; NEWCOLUMN=Chemical; NEWROW=Years,Irrigation,Reps; \
   MVINCLUDE=rows] data; NEWDATA=!P(DDT,DDE,DDD)

"Put the depths into columns, forming totals over the chemicals"
RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \
   TOTAL=Chemical; NEWCOLUMN=Depth; NEWROW=Years,Irrigation,Reps] \
   data; NEWDATA=D

Updated on May 26, 2022

Was this article helpful?

Yes No