Genstat contains a spreadsheet facility that can be used to view and manipulate data. The behaviour of a Genstat spreadsheet differs to other spreadsheet packages such as Microsoft Excel. However, Genstat spreadsheets contain many features similar to Microsoft Excel, plus some additional enhancements for ease of use.
Similarities between Genstat and Excel
Genstat provides a spreadsheet facility that can be used to view and manipulate data cell by cell and includes the ability to easily scroll and move around the cells. Many shortcut keys operate in the same manner in both Genstat and Microsoft Excel. For example, the shortcut keys to copy (CTRL+C) and paste (CTRL+V) data from a range of cells via the Clipboard can be used in both Genstat and Excel. Other shortcut keys in common between the two applications are CTRL+A for selecting all cells, CTRL+O for opening files, CTRL+P for printing, CTRL+S for saving and CTRL+Z to undo any changes. An example Excel workbook and the corresponding Genstat spreadsheet are displayed below:
Within Genstat the cursor and selection keys operate in the same way as in Excel. However, Genstat provides additional control for the selection of columns or cells and this can be customized within the Spreadsheet Column Options tab of the spreadsheet options. There are three options that can be used to control the behaviour for selecting columns or cells: Select columns with a single mouse click, Keep selected columns, and Clear selection on moving current cell.
Differences between Excel and Genstat spreadsheets
In Excel the workbook is used for data storage, the manipulation and all results are displayed within the workbook. In Genstat the data is one aspect of the analysis, and there are a range of other windows for displaying output, audit trails and graphs. The data in Genstat does not need to be displayed within a spreadsheet as it is all held within a central data storage.
The File and Edit menus contain similar functionality between both applications. However, a major difference is that in Genstat all of the functionality for manipulating the data in a spreadsheet can be found in the Spread menu, and within Excel these are found on several different menu items.
Where is the Genstat Spreadsheet?
When Genstat starts up, unlike Excel, you do not have an empty spreadsheet open. However, you can specify for an empty spreadsheet to be created on start-up by selecting the New Spreadsheet option on the Options – General tab. In Genstat an empty spreadsheet can be created by either selecting the File | New menu or by selecting Spread | New | Create. In each case this will open the File New – Spreadsheet dialog where you can specify the number of columns and rows to appear in the spreadsheet. Data from Excel (or another application) can be imported by selecting the File | Open menu and selecting the Excel file name.
In addition to the standard copy and paste functions Genstat includes a facility to create a new spreadsheet from the data on the Clipboard. The spreadsheet can be created by selecting Spread | New | From Clipboard. This opens a dialog providing options for controlling how the spreadsheet is to be created.
One of the major differences between Genstat and Excel is that Genstat stores its data within data structures in a central data core rather than on a particular worksheet. The Genstat spreadsheet can be used to view or manipulate the data. An Excel workbook is an unstructured set of cells containing numbers, text and formulae in any order, whereas a Genstat spreadsheet displays a set of specific data structures with a particular size and attributes. The unstructured nature of Excel can be considered as both a strength and weakness as the organisation is specific to the user, and not inherent in the spreadsheet. In Genstat the spreadsheet is column based where the data are stored in a single column for each measurement. Each column can contain data in one of three formats: numerical (a variate), textual (a text) or categorical (a factor) data. Numerical and textual data cannot be mixed within the same column. If an attempt is made to enter data of an incorrect type (for example text into a numerical column) error messages will appear.
Note that entering a numerical value into a text column will result in the numerical value being treated as text. The columns within the spreadsheet must all have the same number of rows. However, you can have multiple spreadsheets or a book of spreadsheets containing structures of different lengths.
For categorical data Genstat has a data structure type called a factor that can be used to specify an allocation of the data into groups. It allows a limited set of values, one for each group. The groups can be referred to by numbers known as levels and by text known as labels. If an Excel worksheet contains categorical columns (numerical or text) these can easily be converted to factors within Genstat by selecting Spread | Factor | Convert to. See understanding factors within a spreadsheet for more details on factors.
Column Names in Excel and Genstat
As data are stored in Genstat data structure the column names are not just another standard cell that can be edited like all others as in Excel. The column names represent the names of the actual data structures and these can be changed by moving the cursor to the start of the column name (where the cursor changes to a pencil) and clicking the left mouse. The structure names have the limitation that duplicate names cannot be used within a single spreadsheet, unlike Excel which allows this. If you attempt to enter a name already used, then a warning prompt will appear.
The type of a column within Genstat can be either a variate (numerical values), text (strings) or a factor whereas in Excel you can have a mixture of cell types within a column. When importing data from Excel to Genstat any column containing a text item will be read into Genstat as a text column (and any numbers within the column will be converted to the text string representing its value). Within Excel you can use a type marker in the cell containing the column name to specify the Genstat column type the data represent when the data are imported into Genstat. This can be done in two ways:
- As a suffix on the column name
- As a comment on the column name cell
The following type markers can be added to the column name/cell comment:
- ! – Interpret the column as a factor
- # – Interpret the column as a variate
- $ – Interpret the column as a text
- :D – Display the column as a date
- :T – Display the column as a time
- :0…:9 – Display the column with 0-9 decimal places
- :E – Display the column in scientific format
The ! marker can be applied in conjunction with the Date :D or Time :T markers, so that !:D would indicate a factor with levels displayed as dates.
In Excel comments for the cells containing the column names can be used to transfer additional information to Genstat. The comments can contain the following information:
- The factor labels or levels and their ordering
- The column type (!,#,$ etc…)
- The column description
Factor labels can be supplied within a comment using the usual Genstat format where the labels are supplied in a comma separated list preceded by the characters !T( , and terminated with a closing bracket ). If any of the labels in the list contain one or more of the characters ‘!’, ‘,’ or ‘)’ (exclamation mark, comma or right bracket), then the label must be contained in single quotes. Note that if a label contains a single quote this must be duplicated. For example, the string (Pierre David’s) should be supplied as ‘(Pierre David”s)’. If labels do not include any of the reserved characters then you do not need to use single quotes. For example, the labels A, B, C and D-E could be supplied in the comment as !T(A,B,C,'(D-E)’).
Factor levels can be specified as a comma separated list of values preceded by the characters !( and terminated with a closing bracket ). If any of the values in the list is not a valid number, then the list will be interpreted as the labels replacing the !( with !T( when importing the data. Note that when comments are supplied the type marker ! is ignored.
The column types (! for factor, # for variate or $ for text) and display formats (:D etc…) can be given at the start of the comment as described above.
The column description can be supplied as a plain string of text but cannot start with any of the reserved characters, e.g. ‘!‘, ‘$‘, ‘#‘ or ‘:‘.
You can combine the factor label/level specification and column description into a single comment. These must be specified in the following order separated by a new line (use SHIFT+ENTER to enter this into the comment in Excel):
- the factor labels or levels
- the column type or display symbols(!,#,$ etc…)
- the column description
The following shows an example of a comment that could be used to supply factor labels, but with the levels in date format with a description:
:D Seasonal Summary
If the values for a factor are stored using the ordinal values (1,2,3…) within Excel, then the comments could be used to specify the levels or labels to be used within Genstat. For example, a column called Sex in Excel may have the data stored as the values 1 and 2. By using a comment, labels could be supplied as !T(Male,Female) to represent the values within in Genstat.
If the data for a factor within Excel has the actual levels or labels specified then the cell comment can be used to specify the order of these to be used in Genstat. For example, if a column contained a factor with labels AR1, AR37, Nil and Wild, then by default these would be read into Genstat in alphabetical order. Using the cell comment these could be given an alternative order such !T(Nil,Wild,AR1,AR37). Similarly if a column contained a factor with levels 0.5,1,2,10 then a reordering such as !(10,2,1,0.5) could be used to specify the levels should be stored in decreasing order.
An example of an Excel cell comment is shown below.
Size of the Spreadsheet
The size of a spreadsheet in Genstat is set to the length and number of measurements that are to be viewed or manipulated. Genstat does not create the maximum sized sheet just in case you want to enter a large amount of data, as this can be inefficient. For entering data you need to specify the size of the spreadsheet to receive the data. If this is not known then rows and columns can easily be added or removed at a later stage. When a spreadsheet is saved or updated Genstat will remove empty rows and columns if the option Remove empty rows and columns on save is selected in the Spreadsheet File Options tab (opened by selecting the Spreadsheet Options item on the Tools menu).
The default size of an empty Genstat spreadsheet opened on startup is 10 columns by 100 rows. You can change the default size of a Genstat spreadsheet using the Options – Workspace dialog (opened by selecting Options on the Tools menu). If a new spreadsheet is opened using the File menu then you can specify the size of the spreadsheet within the File New – Spreadsheet dialog.
Each Excel worksheet is the same size, that is 256 columns by 65,536 rows in Excel 2003 or earlier and 16384 columns and 1,048,576 rows in Excel 2007 onwards. Within Genstat a single Genstat spreadsheet can hold up to 16,000 columns and an unlimited number of rows (provided you have enough memory on your computer). You can specify the maximum number of columns allowed within the File tab on the Spreadsheet Options.
What if the Excel data is not column based?
Sometimes data within Excel may be entered across rows or in several columns. Genstat provides several menus under the Manipulate item in the Spread menu to help reorganise data into the correct format for Genstat. These facilities include the Stack Spreadsheet which can be used to stack multiple columns into a single column. The Transpose menu can be used to transpose rows and columns. In addition to these there is a Paste Special option in the Edit menu that can be used reorganise data before pasting from the Clipboard into a spreadsheets cells.
A Genstat book can contain multiple spreadsheets, where the limit can be set in the spreadsheet options up to a maximum of 2000 sheets. The Book menu provides facilities for managing multiple sheets, including copying, moving, deleting or splitting sheets. The tabs for the sheets are displayed across the top of the window, whereas in Excel the tabs are displayed at the bottom. The tabs can be dragged into new positions, onto other books or into a book on their own (depending on where the tab is dropped).
Excel has different workbook page types such as data, chart and Visual Basic script. Genstat spreadsheets do not contain graphics. However, they can contain an analysis script of Genstat commands that can be entered and run using the Spreadsheet Analysis menu. Genstat has six types of data spreadsheets, corresponding to the range of data structures found in Genstat. Apart from the standard vector (variate/text/factor) spreadsheets, there are three matrix types (rectangular, symmetric and diagonal), a table type that holds cross-tabulations (like the Excel pivot tables) and a scalar spreadsheet that holds single valued constants.
Within Genstat calculations are performed on whole columns rather than cell by cell. As with Excel, you need to specify a formula or expression to perform the calculation which can be done using the calculations menu.
Genstat’s calculations are not dynamic like Excel’s, so if you change any of the values within cells and want to recalculate dependent cell values, you need to use the recalculate menu to recalculate the results.
Genstat maintains two copies of the data, one visible in a spreadsheet, and the other hidden within a central data core (the server). The names of the data structures in the data core can be viewed at any time within the Data View. By default the two copies of the data are synchronized whenever you move the focus from the spreadsheet to another window. You can change to manually update Genstat from a spreadsheet by turning off the Auto update Genstat option within the spreadsheet options. To manually synchronize the data from the central data core with the spreadsheet select Spread | Update | Changed data to Genstat, or alternatively press the F10 key. This advantage of updating manually is that you can have multiple sheets open with data structures having the same name, whilst retaining only the copy of the latest updated spreadsheet within the central data core. Also this allows manipulation of the data in the spreadsheet without disturbing on-going calculations in Genstat (which can run concurrently with interface operations).
In Excel, dates are stored in a cell as the number of days since the 1st January 1900. A date format is applied to this number, often automatically if it has been initially typed in a date format (e.g dd/mm/yyyy).
Note: Excel has an option to base dates from 1904 rather than 1900. This is allowed for in reading in columns marked as dates in directly from the files. If the column is not marked as a date or time column, a value of 1462 must be added to these column for the dates to appear correctly in a Genstat spreadsheet. Excel does not pass the base date as part of its BIFF format, and so if creating date columns using the Clipboard when the base date in Excel is 1904, this will produce incorrect dates, and the only way to get correct dates is to use the File | Open menu or the Excel import wizard.
Genstat stores dates as the number of days from either 1st January 1900 or 1st March 1600. A date format can also be applied to a complete column (not cell by cell as in Excel). Date formats are not applied automatically in Genstat, but must manually be applied using the Date Formats dialog (opened from the Spread | Column| Attributes/Format item), by selecting a format type of Date and then clicking the Date Format button.
If dates are copied from Excel to Genstat via the Clipboard, they are transferred as text, resulting in a text column. However, if Genstat detects that the text may be a date format, it will prompt to convert the text to a date format. Text columns can be converted to a numerical value displayed as a date by selecting Spread | Column | Convert. This opens the Date Formats dialog where you can select the new column type as a Variate and select the Read Text as Date option. The Date Format button to specify the particular date format used by Excel.
The text :D can be appended to a column name in Excel to identify a column as a date column. When this text is appended the column will be automatically interpreted as a date format whether it is pasted from the Clipboard or is imported from an Excel file. Similarly the text :T can be used to identify a column as a time column, and the column will be automatically interpreted as a time format.
Comments in a cell containing the column name can also be used to set a Date
Generally the output of statistical calculations is displayed within the Output window or within Graphics windows. However, you can save results of statistical analyses to a spreadsheet using the Save buttons on the Stats menus. The Display as Spreadsheet option on the save dialogs will create spreadsheets containing the results.
Enhancements over Excel
Genstat has a number of drag and drop facilities for moving and deleting rows or columns within books or spreadsheets. Also, Genstat’s Clipboard facility is more general than Excel’s, not only providing a menu of options, but recycling data to fill a selection when pasting from the Clipboard.
Moving data between Excel and Genstat spreadsheets
The Excel Import Wizard allows data to be imported from an Excel file. Note that if the file is currently open in Excel it should be saved to make sure that the file on the disk is up to date. Data can also be pasted from the Clipboard by selecting the from Clipboard item on the New option of the Spread menu, or can be pasted directly into an existing Genstat spreadsheet using the Edit menu items. See Transfer data between Windows and Spreadsheets for more information on copying data using the Clipboard.
Excel has pivot tables, where a range of data is selected, and then columns are selected to provide cross-tabulation groups and other columns selected for data summaries (averages, counts etc.). The resulting table is embedded into a sheet in the workbook.
Genstat has menus for creating tables. The Frequency Tables menu produces tables of counts, and the Summary of Groups menu produces cross-tabulations of the usual data summaries for selected variate columns.
The results from Genstat are shown in a special table spreadsheet. The layout of the table may be changed by selecting the Spread | Manipulate | Reorder Table menu or by dragging the columns into new positions. The Spread | Restrict/Filter item can also be used to subset the rows displayed in the table. The Copy Special menus may be used to copy the table to Rich Text Format (RTF Table) or to Excel (Tabbed Text) via the Clipboard.
Excel has menu items to filter the rows displayed in a spreadsheet. Genstat provides many facilities for filtering the rows of a spreadsheet. These are found under the Spread | Restrict/Filter menu. In Genstat terminology, a filter is also known as a restriction as it also restricts the rows used in any analysis, whereas a filter in Excel does not effect the cell calculations. Also in Genstat the filtered rows can be displayed in another colour using the Spread | Restrict/Filter | Display Restricted Rows menu. See the Spreadsheet Restrict/Filter menu for more details.
Graphs created in Excel can either be embedded in a data page, or added in a separate chart page in the workbook. These graphs will dynamically update if the data is changed.
Genstat has a graph selector, which lets you select a graph based on the type of data structures. In addition, Genstat has a suite of graphics menus that can be used to create graphs. A graph produced within Genstat is not dynamic, and is not stored in the spreadsheet. Graphs within Genstat are displayed in a separate graphics window. The graph can be saved in different formats and the display attributes (spans, colours, symbols, line styles, etc.) can be edited by double-clicking on the graph and using the various edit options under the Tools menu in the Genstat Graphics Viewer.
Opening .XLS files produced by Genstat into Excel 2010 or later
Excel 2010 or later always generates a warning on opening a .XLS file (the Excel 95-2003 file format) that was not saved by Excel. You will get a warning saying: Office has detected a problem with this file. Editing it may harm your computer. Click for more details. If you click this warning and then click the Edit Anyway button, the file will open as expected, with no further issues. Saving the file with Excel will stop this happening in the future. However, if using Excel 2010 or later, it is always best to use the .XLSX file format, and then this will not happen.
Limits in transferring data
The maximum number of sheets or pages in a Genstat book is a user defined limit between 20 and 4000, which can be set on the Spreadsheet Options, Books tab. When the specified limit of pages is reached, Genstat will start adding subsequent pages to a new book.
The maximum number of columns in Excel is 16,384 for Excel 2007-2013 and 256 for prior versions. Genstat will not be able to save more than this number of columns to Excel. The maximum number of columns in a Genstat spreadsheet is a user defined limit between 250 and 16,000, which can be set on the Spreadsheet Options, File tab. Genstat will not read in more columns from Excel than the specified limit.
The maximum number of rows in Excel is 1,048,576 for Excel 2007-2013 and 65,535 for Excel 97-2003 and 16,384 for prior versions. Genstat will not be able to save more than this number of rows to Excel, depending on the file version being saved. The only limit to the number of rows in a Genstat spreadsheet is the amount of memory available on the computer.
The maximum factor label length in Genstat is 60 characters. Reading cells in a factor which contain text longer than this will result in the labels being truncated. The maximum text length in a cell in Excel is 65,535, and in Genstat it is a user defined limit between 90 and 10,000, which can be set on the Spreadsheet Options, Conversions tab. Any cells read from Excel that have text longer than the specified limit will be truncated.