Reshapes a data set with classifying factors for rows and columns, into a reorganized data set with new identifying factors (D.B. Baird).
|What to print (
||Factors classifying the rows in the data; default a factor called Rows with a level for each row|
||Factors or texts classifying the columns in the data; default a factor called Columns with labels formed from the column identifiers in
||Row or column factors whose groups are averaged in the output data set|
||Row or column factors whose groups are totalled in the output data set|
||Which summaries to form first (
||Factors to index the new rows|
||Factors to indexing the columns in the new data set|
||Whether to redefine the
||Whether to include factor combinations with no observations in the output data set (
||Pointer containing data to be reshaped|
||Pointer containing the reshaped data columns|
RESHAPE reshapes data matrices. This is useful when rows and columns in a data set are to be swapped to another dimension (rows to columns or vice-versa). It combines the functionality of
UNSTACK in a single procedure. The data columns are stacked into a single columns, with factors indexing the resulting rows created from the original row and column factors. It is then unstacked by one or more of these factors to reshape the data.
RESHAPE goes beyond
STACK, in that more than one column factor can be defined. For example, if you have variates containing different measurements taken at several different times, one column factor could index the measurements and another the times. Also, the data can be collapsed across some of the factors by taking totals or means.
COLCLASSIFICATION options classify the rows and columns, respectively, and together these provide the input factors. If texts or variates are specified with these options, they are converted internally to factors. If any of these options is not specified, a default factor (
Columns) is created, and this can be used in the
MEANFACTORS options, described below. However, the
Columns factor is formed only if there is more than one vector in the pointer specified by the
DATA parameter. This pointer specifies the variates, text or factors to be reorganized. The vectors in
DATA must be compatible with the first vector, and this determines the type of the resulting column. Any type of vector can be combined with a factor or a text, but texts cannot be combined with a variate. If the
COLCLASSIFICATION factors have no values, a warning is given, and their values are formed using
GENERATE. A fault is given if the product of the numbers of levels is not equal to the number of vectors in
MEANFACTORS options specify factors over which totals or means, respectively, are to be formed. The
FIRSTSUMMARY option controls whether the
means or the
totals are formed first.
This will make a difference only when there are missing values in the data matrix. Suppose, for example, we have a 3 × 2 classification of (3,5,6,3,7,*) where the final cell is missing. Totalling (3,5,6) and (3,7,*) and then averaging would give (14 + 10)/2 = 12. However, averaging (3,3), (5,7) and (6,*) and then totalling would give (3+6+6) = 15. If the missing value is replaced by 8, both orders of operation would give the same result, as either (14+18)/2 or (3+6+7) i.e. 16.
NEWDATA parameter saves a pointer containing the reshaped data columns. If the
NEWDATA pointer is undefined, it will be created with labels formed from the input factors labels (if present) or levels. The
NEWCOLUMNFACTORS option lists factors to index the columns in the new data pointer. A column in the output data is created for each combination of these factors (but some of these columns may be dropped according to the setting of the
MVINCLUDE option as explained below). The
NEWROWFACTORS option specifies new row factors, which are formed from the input factors that have not been used in the
TOTALS options. The factors specified by
NEWROWFACTORS are allocated to the original factors in the order in which they were specified, first by the
ROWCLASSIFICATION option, and then by the
COLCLASSIFICATION option. These input factors can be reused if you set option
REDEFINE = yes.
MVINCLUDE option controls whether empty rows or columns are included in the new data set. The default setting,
*, removes both empty rows and columns,
rows includes empty rows,
columns includes empty columns, and
rows,columns includes both empty rows and columns.
You can set option
PRINT=results to print the new data set. By default, nothing is printed.
APPEND to form the data set into a single column indexed by all the input factors.
TABULATE is used to form means and totals, and then
VTABLE is used to extract the data from the summary tables.
UNSTACK is used to extract the reshaped columns.
SUBSET is used to remove missing rows.
RESHAPE ignores any restrictions on the input factors or the
CAPTION 'RESHAPE Example',\ !T('DDT pesticide soil samples from 1970-1993 on long term irrigation', \ 'trial with 4 replicates. DDE and DDD are breakdown products of DDT.',\ 'Measurement units are parts per million (ppm).'); \ STYLE=meta,plain "Read in data with columns giving plots x chemicals and rows years x depth" IMPORT [PRINT=summary] '%Examples%/DDT Soil Samples 1970-93.xlsx'; \ CELL='A4:Z76'; ISAVE=pData "Display input data in a spreadsheet" FSPREADSHEET pData "Set up factors indexing the data columns D1T to I4D where D = dryland, I = irrgated, 1-4 are replicates and T,E and D are chemicals DDT, DDE and DDD" FACTOR [LABELS=!T(Dryland,Irrigated);VALUES=12(1,2)] Irrig FACTOR [LEVELS=4; VALUES=3(1...4)2] Rep FACTOR [LABELS=!T(DDT,DDE,DDD); VALUES=(1...3)8] Chemical " Get the data columns as columns 1 & 2 are the row factors" POINTER [VALUES=pData[3...26]] data "Put the chemicals into columns, all other factors stacked" RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \ NEWCOLUMN=Chemical; NEWROW=Years,SoilDepth,Irrigation,Reps] \ data; NEWDATA=!P(DDT,DDE,DDD) "Put the chemicals into columns, forming means over depths, including missing plots" RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \ MEAN=Depth; NEWCOLUMN=Chemical; NEWROW=Years,Irrigation,Reps; \ MVINCLUDE=rows] data; NEWDATA=!P(DDT,DDE,DDD) "Put the depths into columns, forming totals over the chemicals" RESHAPE [PRINT=results; ROWCLASS=Year,Depth; COLCLASS=Irrig,Rep,Chemical; \ TOTAL=Chemical; NEWCOLUMN=Depth; NEWROW=Years,Irrigation,Reps] \ data; NEWDATA=D