Excel Import Wizard

See our videos on importing data from Excel

The Excel import wizard takes you step by step through reading data from Microsoft Excel into a Genstat spreadsheet. After you have selected the file and worksheet or named range within the file you will be shown a preview of the data. You can go back at any point to select a new Excel file, page or cell range to ensure you obtain the data you require.

By default, the option Use Import Wizard when opening an Excel file is set within the menu Tools | Spreadsheet Options | Sheets tab dialog. However, if this open is not select an alternative dialog can be used when opening Excel files using the File | Open item on the menu bar.

Opening an Excel file using the import wizard involves 5 steps:

Step 1. Selection of the Excel file

This is the usual Genstat File | Open dialog, that prompts you to select the name of the Excel file that you want to read the data from. Select the filename you wish to use and then click OK.

Step 2: Selection of Worksheet or Named Range

This dialog provides a list of the worksheets (prefixed S:) and named ranges (prefixed R:) in the Excel spreadsheet. If named ranges are specified within the Excel file then these have the advantage that both the sheet and cell range are specified permanently in the Excel file. These are automatically updated if further rows or columns are inserted into the cell range.

The content of the cells to be read in can be selected with the Read in option:

See our video on using the Read in options

Values	The normal cell values displayed in Excel
Formulae	The formulae as text used in the cells
Font colours	The font or foreground colour used in the cells
Cell colours	The cell or background colour used in the cells
Font names	The font names used in the cells
Styles	The font styles (bold, italic, underline) as an integer used in the cells
Font size	The font size as a number used in the cells

The colours are in Genstat RGB numerical format (=256*(256*red + green) + blue, where red, green and blue can take values 0…255) for colours, the style is composed as bold + 2*italic+ 4*underline where bold = 1 if the font is bold and 0 otherwise, italic = 1 if the font is italic and 0 otherwise, and underline = 1 if underlined, 2 if double underlined, 3 if single accounting underlined,4 if double accounting underlined and 0 otherwise. Sometimes users encode data in the cell formats, for example colouring odd values, and this facility allows this information to be recovered and used in an analysis. The reading of the formulae also allows you to check for consistency in these.

The Apply Excel cell colours option if selected will display the Excel cell and font colours in the resulting Genstat spreadsheet. Note: applying Excel colours to large spreadsheets can use a large amount of extra memory.

Select one or more of these and click Next or click Back to select a different Excel file. Multiple selections can be made by holding down the Ctrl and Shift keys while clicking with the mouse or by clicking on the Select All button to select all the worksheets and named ranges. If two or more worksheets or named ranges are selected then the wizard goes straight to step 5 and all the data is opened into a multi-paged book.

If multiple data ranges are to be loaded from the same worksheet, then the import wizard will to be used multiple times on the same file specifying the range of cells each time.

This step includes an option to add the data to a new or existing book. The default book that spreadsheets are added to can be specified using the menu Tools | Spreadsheet Options | then clicking the Books tab.

Step 3: Selection of Cells

This dialog provides a preview of the data on the selected worksheet. The data are truncated to display 8 characters per cell and the first 20 columns and 20 rows of the data rectangle.

You can specify a range of cells by selecting the Specified range option and entering the range in the space provided. When a range is entered the preview will change to display from the top left cell of the range.

Click Next to continue or click Back to select a different worksheet within the file.

Step 4: Selection of Columns

This step provides a preview of the columns of data found in the selected cells. Columns can be omitted by transferring then back into the Available columns list. Only the columns displayed in the selected list will be imported.

By default, Genstat will not include any columns in the worksheet or cell range that do not contain any data. However, if empty columns are to be imported then the Remove empty columns can be unticked.

Click Next to continue or click Back to select a different range of cells within the worksheet.

Step 5: Options Page

The final step provides options specifying the layout of the data within the worksheet or named range. These options are separated on 4 different tabs.

1. Column Names Tab

These options specify the layout of the column names.

Read Column names from File
Controls whether column names are specified within the file.

Yes if all labels	Read the column names from the row specified in the Column names in row option if all the cells in this row contain labels only or are missing. A default column name will be generated for a missing cell.
Yes	Use the cells specified in the Column names in row option. If a cell contains a number, then it will be prefixed with “_” to ensure it is a valid Genstat identifier name.
No	Do not read the column names from the file and generate default names for the columns.

Column names in row
Specifies the row number to use for column names.

Column descriptions in row
When selected, the cells in the specified row number will be used for column descriptions (the EXTRA keyword for Genstat structures).

Row numbers
Controls whether the row numbers are relative to the data range or relate to a whole worksheet.

Relative	Row numbers start from the top of the restricted cell range.
Absolute	Row numbers start from the top of the worksheet and thus correspond to the absolute row numbering in Excel/Quattro.

2. Factors Tab

These options control the conversion of data to factors on loading.

Suggest columns with only a few unique values to be factors
When selected, if columns contain few unique repeated values then you will be prompted with a dialog where these columns can be converted to factors. The Suggest converting columns with <= N unique items option can be used to control the limit for the number of unique items. This option is set on the Tools | Spreadsheet Options | Columns tab.

Sort factor levels into alphabetical or numeric order
When selected, columns loaded as factors will have their levels (or labels for a text column) sorted into ascending order.

Data contains variates & factors only
When selected, any column containing text will be converted to either a variate or factor. If the number of labels within the spreadsheet column is less than 20% of the total number of rows, then the column will be converted into a variate, otherwise the column will be converted into a factor.

Text to Number Conversions
This controls how text labels are interpreted as numbers when a label is changed to a value when converting to a variate.

Strict	Only labels that contain just numeric data are converted, otherwise they are set to a missing value (e.g. ’10’ -> 10, ‘1O’ -> *.)
Single	Single characters and labels containing just numeric data are interpreted as numeric values, otherwise they are set to a missing value (e.g. o,O -> 0, i,I,l,L -> 1, s,S-> 2, z,Z -> 5, comma -> decimal point, ’10’ -> 10, ‘Io’ -> *).
Common	Multiple characters and labels containing just numeric data are interpreted as numeric values, otherwise they are set to a missing value (e.g. ‘Io’ -> 10, ’23X’ -> *).
Standard	Multiple characters and labels containing just numeric data are interpreted as numeric values, but text is truncated at the end of a number (e.g. ’23X’ -> 23, ‘A2X3’ -> *).
Lax	Any digits are read from the text (e.g. ‘A2X3’ -> 23).

Missing value text
By default, empty cells or cells with a text consisting of a single asterisk * are read in as missing values (unless the column is a text column, in which case the * is read in as a text value ‘*’). If an alternative string to * has been used as a missing value place holder in the worksheet then entering this text here will cause this to be interpreted as a missing value in numerical columns. For example, a user may have entered ‘M’ for a missing value or may have added a note ‘Died’ in a column of live weights.

Additional information about conversions

Columns whose names are suffixed with a ! character will be automatically converted into factors. Similarly, if column names end with a $ or # character, then these will be converted into a text and variate respectively. As well as supplying a conversion character, column format information can be appended to a column name. For example, a suffix of :0, :1, :2 … :9 specifies that the column should be displayed with the given fixed number of decimal places. A suffix of :D specifies the column is to be displayed in date format. This will use Genstat’s default date format. Similarly, a suffix of :T specifies that the column is to be displayed in time format.

The order of factor levels and labels can be supplied within an Excel file by adding a comment to the cell containing the column name. Within the comment the factor levels must start with a ! and labels must start with !t on a new line. The levels and labels should then be entered within brackets, for example, for levels this could be !(100,90,50,10) and for labels this could be !T(Control,A,B,C) or !t(‘Control’,’A’,’B’,’C’). The order of the items in the comment will define the order of the levels or labels in the factor. If a column just contains ordinal values (i.e. 1…n), then the comment can still be used to assign labels or levels to these groups. The first item in the comment will define the level or label for group 1 in the factor and so on.

A column description can also be supplied in the comment as a separate line, but this must not start with an exclamation mark !, hash # or dollar $ which are used for marking column types.

The following example shows two comments that have been entered into columns within Excel (using the Excel menu item Insert | Comment). Each column will be opened in Genstat as a factor using the defined levels/labels and will include a column description.

See Using Genstat with Excel for more information on setting up column types and comments.

3. Title/Active Sheet Tab

This controls some general features available for the spreadsheet.

Read title from cell range
When selected, the cell range (e.g. A2:B3) or named range (which must be located on the same sheet as the data) specified in the space provided will be used to read text for the spreadsheet title. Note that text for the different cells within this range will be concatenated.

Set as active sheet
This sets the new spreadsheet that is created as the active spreadsheet.

4. Rows Tab

These options control which rows of data are to be read into the new spreadsheet.

Remove empty rows from the spreadsheet
When selected, any rows that do not include any data are removed from the resulting spreadsheet.

Skip first X non-empty rows
When selected, the first specified number of rows will be excluded from the spreadsheet. This is useful to exclude any comments that made have been made in the spreadsheets. Note that text in these rows can still be used for column names or descriptions (see Column Names Tab).

Action buttons

Back	Got back to the previous step to change the file, cells, columns selected, and any options set.
Next	Accept the options given and move on to the next step in reading in the data.
Finish	Use the specified information to read the data from the Excel file and close the dialog. If any factor columns contain a high proportion of missing values you will be prompted with a dialog that can be used to automatically fill the empty factor cells.
Cancel	Close the dialog without opening the file.