1. Home
  2. Generate a Random Subset from a Spreadsheet

Generate a Random Subset from a Spreadsheet

Select menu: Spread | Calculate | Random Subset

Use this to create new spreadsheets based on a random subset/sample of rows from a spreadsheet.

  1. After you have imported your data, from the menu select 
    Spread | Calculate | Random Subset.

Number of samples

Specify the number of random samples to be used. Alternatively, you can provide a percentage of the number of rows to be used by selecting the % option. Note that if Sample with replacement option is selected, then the number of samples must be less than the number or rows in the spreadsheet (or 100 %).

Sample with replacement

When selected, sampling with replacement will be used when forming the subset. That is at each random selection of a row, all the available rows are eligible for selection. If this option is not selected then only the rows that have not been previously selected are eligible for selection.

Weighting

If a column in the dropdown list is selected, then the values in the selected column will be used in a weighted random sample. The default is the setting, where all rows have equal chance of being selected. Rows with a weight value ≤ 0 will not be included in the random sample.

Seed

The Seed option is used to specify an integer value that will be used to start the randomization. If a value of * is given for the seed, a value from the computer’s clock will be used.

Create unique column names

When selected, columns in the spreadsheet will have new names generated for them so that they are unique, otherwise the columns will have the same names as the original spreadsheet.

Randomize rows

When selected, rows in the resulting spreadsheet will be sorted into a random order.

Add to book

This lists all open books within Genstat. Select the book that the new sheet will
be added to. If the data are to appear in a new book then select the New Book setting.

See also

Split or Subset a Spreadsheet
Randomize Rows in a Spreadsheet
Duplicate a Spreadsheet
Spreadsheet Manipulate Menu
Spreadsheet Calculate Menu

The SUBSET procedure can be used in conjunction with the
GRUNIFORM function of
CALCULATE command in the command language to do sampling
with or without replacement.

To sample without replacement P rows out of N, the following commands could be
used:

SORT [INDEX=GRUNIFORM(N;0;1)] !(1…N); Pos
SUBSET [Pos <= P] X,Y; Sample_X,Sample_Y

To sample with replacement P rows out of N, the following commands could be used:
CALC Row = INT(N*GRUNIFORM(P;0;1)) + 1
VARIATE Sample_X,Sample_Y; (X,Y)$[Row]

Updated on March 18, 2019

Was this article helpful?