1. Home
  2. Conversions

Conversions

These options control how columns are converted between data types (factors, variates and texts) and how data is added to columns.

  1. To access this dialog select Tools | Spreadsheet Options then click the Conversions tab.
  2. Set options as required then click OK to close the dialog.

Sort factor levels on convert When converting text or variate columns into factors, this option will sort the numeric levels (or text labels) into ascending order.
Suggest filling in missing factor levels

Selecting this option opens a dialog when a factor column has a large number of missing values and is being read from a foreign data source such as Excel. The dialog lets you carry out a fill down action to copy values into cells with missing values.

Allow new factor levels in edit This lets you add new factor levels/labels to a spreadsheet by typing them directly into a cell and overwriting the original contents. Entering a level or label that doesn’t exist in the factor will open up a dialog querying if you want to add a new level/label to the factor, or change the entered value to the closest matching level/label.
Suggest converting columns with <= unique items to factors

Selecting this option opens a dialog when a column has repeated values, but has fewer unique values than the number specified here. The dialog lets you convert the columns to factors.

Tolerance on assigning values to nearest level

When you enter a numerical value into cell in a factor column, if the number is not an existing factor level, it will be assigned to the closest existing level, provided the difference between the entered number and that level is less than the provided tolerance (which must be positive). A value of 0 will always raise an error on any mismatch.

For example, you have existing factor levels 10, 20 and 30 and a tolerance of 1. Entering 9 or 11 is within tolerance of level 10, so Genstat will convert your value to this level. Any other values close to 10 will result in the error message below.

Tolerance on creating levels from variate

A positive value in this field will be used as a tolerance in creating levels from a factor. If you enter two values that differ by less than this tolerance, then they will be assigned to the same factor group.

The default value of 0 places all unique values into their own group. If the variate being converted to a factor contains calculated values, then these values may differ in their final few significant figures.

For example, the calculated values 1/3 and 200/600 may be assigned to different groups depending on how the final figure rounds. A small value such as 1e-12 could then be used to allocate these two distinct values to the same group.

Remove unused factor levels on update

If selected, when updating Genstat, any factor levels or labels that are not used will be removed. For example, if a factor column Weekdays does not contain an entry for Tuesday, the label Tuesday will be removed from the set of labels when you next update Genstat.

This option is only applied once, either when a column is new or is imported from a foreign source.

Use standard factor orders When selected, if all labels in a factor match either the compass points, weekday or month names, or the User defined label order (below), then these labels will be sorted into the order of the matching list
User defined label order

This opens the User Label Sort Order dialog, which allows a list of labels to be entered. If the labels in a factor match the user defined labels, then they will be sorted into the order in this list. Multiple orders may be entered by providing these in sections.

For example a colour order and a species order can be entered by providing the list of labels: Black, Blue, Green, Red, Yellow, White, followed by Clover, Lotus, Ryegrass, Cocksfoot.

Ignore case Specifies that individual lower and upper case characters in text columns will be ignored during certain operations, such as sort, search, and convert to factors. In other words, case will not affect these operation.
Ignore blanks Leading and trailing blank characters in text columns will be ignored during certain operations, such as sorting and converting to factors.
Check for date values When selected, Genstat checks all new text columns to see if they contain data in date format. If columns appear to contain dates the Convert Text to Date dialog will prompt you to convert these to dates. Check for date values can also be enabled/disabled when you paste columns from the clipboard, merge multiple files or append multiple files.
Text to number conversions

Sometimes numbers are accidentally mistyped as letters. Common mistypings include i, I, l, L for the number 1; o, O for the number 0; a comma for a decimal point, etc. Text to number enables Genstat to convert text labels by substituting numbers for letters or, where this is not possible, the label is converted to a missing value.

  • Strict – Change text strings to numbers only when there are no non-numeric characters in the string. For example, the text ’23’ is converted to 23, the text ‘1a’ contains an alphabetic character so the whole label is converted to a missing value.
  • 1 substitution – Allow a single number to be mistyped and substitute this with the appropriate number. A single lower case ‘o’ or upper case ‘O’ is converted to 0: lower case ‘i’, upper case ‘I’, lower case ‘l’, or upper case ‘L’ is converted to 1. Lower case ‘s’ or upper case ‘S’ is converted to 2. A comma is converted to a decimal point. For example, ‘1o’ contains a single alphabetic character so is converted to 10, ‘1,i’ contains two a comma and an alphabetic characters so is converted to a missing value.
  • Common substitution – Allows multiple numbers to be mistyped and converts them to the appropriate numbers. For example, ‘LOO’ is converted to 100.
  • Standard – The same as Common substitution, but any letters at the end of a number are ignored. For example, ’23o’ is converted to 23, ‘i20’ has an alphabetic character at the front so the whole label is converted to a missing value.
  • Lax – Change all text numbers to actual numbers and remove any non-numeric characters. For example, ‘A23X’ is converted to 23.
Maximum text length The maximum text length in Genstat is a user-defined limit between 90 and 10,000. You can select a limit from the dropdown list or type your own value. Texts that are read from any data source which are longer than the specified limit will be truncated.
Unicode in Excel XLSX files

When reading an Excel xlsx file, this option controls what happens to Unicode characters that are not supported by the extended ASCII character set. Some Unicode characters, such as Greek or mathematical symbols, can be displayed using the Genstat typesetting commands used by the PRINT directive. However, many foreign language Unicode characters cannot be displayed by Genstat and these will be removed with a warning. The options below control how the supported Unicode characters are handled.

  • Convert Unicode to Genstat typesetting – The Unicode characters are converted to equivalent typesetting strings, e.g. α is converted to ~{alpha} and √ to ~{sqrt}.
  • Convert Unicode to ASCII characters – The Unicode characters are converted to the nearest ASCII characters, e.g. α is converted to a and √ to sqrt.
  • Remove all Unicode characters – All Unicode characters are removed.

The conversion of Greek letters to ASCII characters is given in the table below.

Auto extend on pasting cells When pasting data from the clipboard new rows or columns will be added to the sheet as required. If this option is not enabled clipboard data will be truncated at the current row and column extents.
Auto delete on cut When data are cut from a spreadsheet they are replaced by missing values by default. If this option is enabled, Genstat will delete entire rows or columns that are cut, rather than filling the cells with missing values.
Calculations

When a variate created from a calculation is converted to a factor, re-calculating the column may generate some invalid factor levels. For example, if the column C1 contains the values 1, 2, 3 and 4 and the column C2 is calculated as C2 = 2*C1, this will produce the values 2, 4, 6 and 8. After converting C2 to a factor, if a value is changed in C1 to 5 this will create a recalculated value of 10 which is not a valid factor level (2, 4, 6 or 8) of C2. This item controls what happens when converting the variate to a factor:

  • Always remove calculation – The column’s calculation will always be removed.
  • Always keep calculation – The column’s calculation will always be kept.
  • Prompt to remove or keep – A dialog will display, giving you a choice to remove or keep the column’s calculation
Defaults Resets the spreadsheet options on this tab to their default values.
Updated on March 1, 2019

Was this article helpful?