1. Home
  2. Comparing two Spreadsheets

Comparing two Spreadsheets

Genstat can automatically compare two spreadsheets by checking the data values in each cell. You can verify data using the following comparison methods:

  • two spreadsheets that are already open within Genstat.
  • a currently open spreadsheet with an external file, such as another .gsh file or Excel file.
  • a currently open spreadsheet with an ODBC database file.
Automated comparison of two spreadsheet files

    1. From the menu select Spread | Sheet | Compare.

    2. Select a Data source:
      • File – enables the Browse button, which lets you navigate to your second file.
      • Clipboard – compares your open spreadsheet with data that is currently in the clipboard (this is described further down the page).
      • Sheet – makes the Sheet dropdown list visible, so you can select another spreadsheet that you have already opened.
      • ODBC – after you follow steps 3 and 4 below to set the options on the current dialog, click OK to navigate to a file in an existing ODBC database.
        • The ODBC query wizard will prompt you to select the columns you want to compare then click Next.
        • If required, apply filters to specify which rows to include in your query then click Next.
        • Select Run the SQL query and click Finish to run the comparison. Continue from after step 4.
    3. Select a Match columns by option.
      • Position – matches the spreadsheet columns by their position i.e. column one in the second spreadsheet will be appended to column one in the first spreadsheet, etc.
      • Name – matches and appends spreadsheet columns that have the same name, regardless of their position.
    4. Set other options as required (see Comparison options below) or leave them at their defaults then click OK. (The default options are normally sufficient.)
      • If you are comparing an open spreadsheet to an Excel file the Excel Import Wizard will open to let you set import options. You can click Finish to skip this step.

After Genstat has run the comparison you will see one of the following prompts:

If you see the warning prompt above click OK to display the results in the Output window.

Comparison options

Tolerance on numerical comparisons Specifies a tolerance to be used for numerical comparisons. When comparing two numerical values, if the absolute difference is less than this amount then this indicates a match between the items.
Text comparisons ignore case When comparing text values, Genstat will ignore any differences in upper case and lower case. For example ‘GENSTAT’ and ‘Genstat’ will be recognized as the same.
Match columns by
  • Position – matches the spreadsheet columns by their position i.e. column one in the second spreadsheet will be appended to column one in the first spreadsheet, etc.
  • Name – matches and appends spreadsheet columns that have the same name, regardless of their position.
Match rows using ID

By default the data are compared row by row. You can specify an ID column, which will be used to match rows together if the sheets are in a different row order.

If the two sheets below were compared using the ID column Plot, they would compare as the same, since the row with plot 101 is compared with the row with plot 101 in the second sheet. If you did not use an ID column, then all but rows 101 and 103 (which happen to be in the same position in both sheets) would compare as different.

Selected columns only Compares only the selected columns in the current spreadsheet with the other data source.
Abort comparison after finding N differences  The comparison will be aborted once this limit is reached.

Comparing a spreadsheet or columns with clipboard data

Comparing an open spreadsheet with clipboard data lets you compare the entire spreadsheet or just selected columns.

  1. Select the columns you want to compare or select the entire spreadsheet.
  2. From the menu select Spread | Sheet | Compare.

  3. Select Clipboard.
  4. Set options as required (see table below) or leave them at their defaults then click OK. (The default options are normally sufficient.)
Tolerance on numerical comparisons Specifies a tolerance to be used for numerical comparisons. When comparing two numerical values, if the absolute difference is less than this amount then this indicates a match between the items.
Text comparisons ignore case When comparing text values, Genstat will ignore any differences in upper case and lower case. For example ‘GENSTAT’ and ‘Genstat’ will be recognized as the same.
Match columns by
  • Position – matches the spreadsheet columns by their position i.e. column one in the second spreadsheet will be appended to column one in the first spreadsheet, etc.
  • Name – matches and appends spreadsheet columns that have the same name, regardless of their position.
Match rows using ID

By default the data are compared row by row. You can specify an ID column, which will be used to match rows together if the sheets are in a different row order.

If the two sheets below were compared using the ID column Plot, they would compare as the same, since the row with plot 101 is compared with the row with plot 101 in the second sheet. If you did not use an ID column, then all but rows 101 and 103 (which happen to be in the same position in both sheets) would compare as different.

Selected columns only Compares only the selected columns in the current spreadsheet with the other data source.
Abort comparison after finding N differences  The comparison will be aborted once this limit is reached.
  1. The following dialog will appear. Set options as required or leave them at their defaults then click OK. (The default options are normally sufficient.)

After Genstat has run the comparison you will see one of the following prompts:

If you see the warning prompt above click OK to display the results in the Output window.

Sort factor levels Factor columns will have their levels (or labels for a text column) sorted in ascending order.
Suggest columns to be factors If selected, Genstat will prompt you to convert columns to factors where the columns have repeated values, and fewer unique values than the specified number given in the Suggest converting columns with <= N unique items option within the Spreadsheet Options menu.
Remove empty rows / Remove empty columns When selected, Genstat will automatically remove any rows/columns that do not contain data.
Data contains variates & factors only Any column containing text will be converted to either a variate or factor. If the number of labels within the spreadsheet column is less than 20% of the total number of rows the column will be converted into a variate, otherwise the column will be converted into a factor. When reading the column as a variate, common typos such as entering a letter O for the number 0 (or I for 1) will be fixed.
Column descriptions in row When selected, the cells in the specified row number will be used for column descriptions.
Ignore type markers (!#$) in column names When reading in Excel data, the markers (! for a factor, # for a variate and $ for a text) will be ignored and the contents of the column used to decide its column type (variate for numbers only, text for other characters). If this option is not selected, Genstat will use any type markers that are present in the column name to determine the column type. (Markers are used when importing foreign file formats to convert columns to factors, etc. See Converting data types (Variates to Factors, etc). for more information)
Column names in first row

Controls whether column names are specified within the file.

  • Yes if all labels – read the column names from the row specified in the Column names in row option if all the cells in this row contain labels only or are empty. A default column name will be generated for an empty cell.
  • Yes – use the cells specified in the Column names in row option . If a cell contains a number, then it will be prefixed with “_” to ensure it is a valid Genstat identifier name.
  • No – generate default names for the columns, ignoring those in the file.
Commas

This controls how Genstat interprets commas in the clipboard data.

  • Leave – interpret the text with commas as normal (leave commas unchanged).
  • Change to decimal – change commas to decimal places. This may be required to convert data if they are represented in European numeric formatting where a comma is used for a decimal place. For example, 3,14 will be converted to 3.14.
  • Remove – Remove commas from the text. This may be required if commas have been used when representing numbers, for example 21,000.
Missing value text Lets you supply an alternative text to ‘*’ for missing values, so for example ‘NA‘ could be used to represent a missing value.
Check columns for date values If selected, Genstat checks all text columns to see if they contain data in date format. If columns appear to contain data in date format you will be prompted to convert these to date values.
Date format This allows you to select a date format or create a custom format.

Updated on June 20, 2019

Was this article helpful?