1. Home
2. Generate a Random Subset from a Spreadsheet

# Generate a Random Subset from a Spreadsheet

Use this to create new spreadsheets based on a random subset/sample of rows from a spreadsheet.

Spread | Calculate | Random Subset. ## Number of samples

Specify the number of random samples to be used. Alternatively, you can provide a percentage of the number of rows to be used by selecting the % option. Note that if Sample with replacement option is selected, then the number of samples must be less than the number or rows in the spreadsheet (or 100 %).

## Sample with replacement

When selected, sampling with replacement will be used when forming the subset. That is at each random selection of a row, all the available rows are eligible for selection. If this option is not selected then only the rows that have not been previously selected are eligible for selection.

## Weighting

If a column in the dropdown list is selected, then the values in the selected column will be used in a weighted random sample. The default is the setting, where all rows have equal chance of being selected. Rows with a weight value ≤ 0 will not be included in the random sample.

## Seed

The Seed option is used to specify an integer value that will be used to start the randomization. If a value of * is given for the seed, a value from the computer’s clock will be used.

## Create unique column names

When selected, columns in the spreadsheet will have new names generated for them so that they are unique, otherwise the columns will have the same names as the original spreadsheet.

## Randomize rows

When selected, rows in the resulting spreadsheet will be sorted into a random order.

This lists all open books within Genstat. Select the book that the new sheet will
be added to. If the data are to appear in a new book then select the New Book setting.

The SUBSET procedure can be used in conjunction with the
GRUNIFORM function of
CALCULATE command in the command language to do sampling
with or without replacement.

To sample without replacement P rows out of N, the following commands could be
used:

SORT [INDEX=GRUNIFORM(N;0;1)] !(1…N); Pos
SUBSET [Pos <= P] X,Y; Sample_X,Sample_Y

To sample with replacement P rows out of N, the following commands could be used:
CALC Row = INT(N*GRUNIFORM(P;0;1)) + 1
VARIATE Sample_X,Sample_Y; (X,Y)\$[Row]

Updated on March 18, 2019