SPCOMBINE procedure

Combines spreadsheet and data files, without reading them into Genstat (D.B. Baird).

 

Options

OUTFILENAME = text Name of the output file
METHOD = string token How to add the new data from the files specified by the FILENAME parameter (add, append, concatenate, merge); default appe
COLMATCH = string token How to match columns when appending (name, position); default posi
GROUPS = factor Factor to identify sections of appended files
OLDGLABEL = texts Label to use in the GROUPS factor for the original data if GROUPS has not already been defined
MATCH = text or pointer Up to four columns in the files specified by the FILENAME parameter to use as keys when merging files; default * uses the first column in the file
WITH = text or pointer Columns in the OUTFILENAME file to use as keys when merging files; default * uses as many columns of the initial columns in OUTFILENAME as are needed to give a column for each MATCH column
UPDATE = string token Whether to use columns with matching names to replace existing columns when concatenating or merging files (yes, no); default no changes the names of columns with the same name as existing columns so that they become unique
EXTRAROWS = string token What to do with extra rows when merging files (all, matched, none); default all merges in all the extra rows into the data, matched merges in just the extra rows which have matching ids into the data, and none does not merge the extra rows into the data.

 

Parameters

FILENAME = texts Names of files containing new data to be combined with the data in the OUTFILENAME file
SHEETNAME = texts Name of a worksheet or a named range within an Excel, Quattro, 123 or Open Office spreadsheet file; default takes the first sheet
CELLRANGE = texts Cell range giving the top left and bottom right cells within a worksheet; default takes all the data that it contains
ROWSELECTION = variates Row numbers of the units of data to be included into the OUTFILENAME file; default takes all the rows
COLSELECTION = variates Numbers of the columns of data to be combined with the OUTFILENAME file; default takes all the columns
PAGENAME = texts Page name for each new sheet when METHOD=add; default 'SHEET' where n is the number of the sheet in the OUTFILENAME file, unless the sheet is already named in the FILENAME file
GLABEL = texts Label to use in the GROUPS factor to identify the data from each FILENAME file; if this is unset, GROUPS is defined with only levels

 

Description

SPCOMBINE combines spreadsheet files into a single file, without reading all the data into Genstat. This is intended for use in particular with very large data sets.

The names of the files containing the new data values to be combined with an original dataset are specified using the FILENAME parameter. The file name can also be an internet URL prefixed with http://, https:// ftp:// or file://, in which case the data source is downloaded and then imported. The name of the output file is specified by the OUTFILENAME option. This may already contain a set of data. Alternatively, if it does not exist already, it will be created.

The following file types are supported for input: Excel 2-5, 95, 97, 2000, XP, 2003, 2007, Open Office, Lotus WK1, Quattro (WQ1, WB*, QPW), dBase 2-5, Paradox 3-9, Genstat GSH and GWB, SAS PC 6.03-12, 7-9, SAS Transport, SAS JMP, Minitab 8-14, Statistica 5 and 6, Systat, MStat, Instat, Epi-Info, SPSS/Win, Gauss Data/Matrix (PC/Win/Unix), MatLab, S+ (PC/Unix), Stata 4-8, StatGraphics, R data frames, Weka Attribute files, SigmaPlot 7-9, OSIRIS, Limdep, RATS, EViews, GRETL panel files, Comma delimited text files (*.CSV), Cornell Ecology format, MapQTL trait files (.QUA), ArcView/Info Shapefiles, MapInfo Exchange files, Windows Bitmap (*.BMP), Windows Sound (*.WAV), NMR Binary files and image files (JPG, GIF, TIF, PNG). The file type is worked out from the file contents, so the usual extension need not be used with the exception of the following file types which do not contain a unique signature: Epi-Info (.REC), S+ (.SDD) and Paradox (.DB). Any files not containing a unique file signature, but ending in these extensions will be classified as above. Any other file extensions will be attempted to be read as a comma, space or tab delimited text file. A subset of these file types is supported for output in the OUTFILENAME file: Genstat, Excel, Open Office, R, dBase, Lotus, Weka, SAS Transport, CSV and TXT. If the OUTFILENAME file does not exist, its format is determined by its extension.

You can use the SHEETNAME and CELLRANGE parameters to define a specified section of data to take from a spreadsheet file (Excel, Quattro, 123, Open Office). In addition, the ROWSELECTION and COLSELECTION parameters can specify that a subset of the rows or columns, respectively, is to be included. They can be set to a variate containing the numbers of the rows or columns. With COLSELECTION, you can also supply a text containing column names. So, for example, to import only rows where the variate Year is greater than 2005, you could put

ROWSELECTION = WHERE(Year > 2005)

(the WHERE function gives the unit numbers where a logical expression has the value one i.e. true). Note that the variate Year must already have been imported into Genstat, in order to do the calculation – this can be done using the IMPORT procedure.

The METHOD option controls how the files are combined. The add setting can be used to include the new data from each FILENAME file as a new sheet in the OUTFILENAME file, provided this is an Excel or Genstat GWB file. The PAGENAME parameter specifies the name to use for the page of the added sheet. If this is not set, SPCOMBINE looks to see whether the sheet is already named in the FILENAME file. If so, it will use that name, adding a number at the end, if necessary, to make the name unique. Otherwise, the name will be 'SHEET' where n is the number of the sheet in the OUTFILENAME file.

The append setting of METHOD appends the new data values at the end of those in the OUTFILENAME file. The COLMATCH option specifies whether the columns are matched by name or position. If a matching column is not found, a new empty column is created and the data are appended to that column. The append will fail if the types of the original and appended data do not match (e.g. if you attempt to append a text onto a variate). The GROUPS option can specify a factor to indicate the source of the data. If the factor is not already present in the OUTFILENAME file, the OLDGLABEL option can supply a label to use to identify the original rows of data. The GLABEL parameter can specify the label to use for the new rows appended from the FILENAME file. If these are not supplied, the factor will have only levels.

If METHOD=concatenate, the new data from the FILENAME file are added as new columns on the right-hand side of the sheet in the OUTFILENAME file. However, the types of the sheets (vectors, matrix or scalar) and their lengths must match. The new data can also be added as new columns on the right-hand side of the OUTFILENAME file by setting METHOD=merge. The rows of data from the FILENAME file are now merged with the original rows using up to four key columns specified by the MATCH and WITH options (for the new and original rows, respectively). If MATCH is not specified, the first column of new data is used. If WITH is not specified, the MATCH columns are matched with the same number of initial columns of the OUTFILENAME file. If a column with the same name already exists in the OUTFILENAME file when concatenating or merging, the default action is to rename the new column by adding a number to the end of the name to make it unique. Alternatively, if you set option UPDATE=yes, the new column will replace the existing column. When METHOD=merge, the EXTRAROWS option controls what happens to rows in the data being merged that do not appear already in the OUTFILENAME file: these rows can be added (all), added only if the id is already in the file (matched) or omitted (none).

Options: OUTFILENAME, METHOD, COLMATCH, GROUPS, OLDGLABEL, MATCH, WITH, UPDATE, EXTRAROWS.

Parameters: FILENAME, SHEETNAME, CELLRANGE, ROWSELECTION, COLSELECTION, PAGENAME, GLABEL.

 

Method

The data files are combined by passing a request to the Dataload.dll library. This reads the data from OUTFILENAME and each FILENAME, includes the new data, and then rewrites OUTFILENAME.

 

Action with RESTRICT

When METHOD=concatenate, restrictions from FILENAME are added to the new rows of OUTFILENAME. When METHOD=add, restrictions from FILENAME are retained on the new pages in OUTFILENAME. Otherwise restrictions are ignored.

 

See also

Directive: SPLOAD.

Procedure: IMPORT.

Commands for: Input and output.

Example

CAPTION 'SPCOMBINE example'; STYLE=meta
FOR [INDEX=i] F='F1.GWB','F2.GWB','F3.GWB'
  CALC     [SEED=833611] I = GRSAMPLE(10;!(1...20))
  CALC     X,Y,Z = i*(10,100,1000) + !(1...10)
  RESTRICT I,X,Y,Z; I < 12
  EXPORT   [OUTFILE=F; METHOD=overwrite] I,X,Y,Z
  DELETE   [REDEFINE=yes] I,X,Y,Z
ENDFOR

SPCOMBINE  [OUTFILE='Append.GWB'; METHOD=append; GROUPS='File']\ 
           FILE='F1.GWB','F2.GWB','F3.GWB'; GLABEL='F1','F2','F3'
SPCOMBINE  [OUTFILE='Added.GWB'; METHOD=add;]\ 
           FILE='F1.GWB','F2.GWB','F3.GWB'; PAGENAME='F1','F2','F3'
SPCOMBINE  [OUTFILE='Concat.GWB'; METHOD=concatenate]\ 
           FILE='F1.GWB','F2.GWB','F3.GWB'
SPCOMBINE  [OUTFILE='Concat2.GWB'; METHOD=concatenate; UPDATE=yes]\ 
           FILE='F1.GWB','F2.GWB','F3.GWB'
SPCOMBINE  [OUTFILE='Merged.GWB'; METHOD=merge; MATCH='I'; WITH='I';\
           UPDATE=yes] FILE='F1.GWB','F2.GWB','F3.GWB'
Updated on January 17, 2018

Was this article helpful?