Select menu: Spread | Calculate | Random Subset
Use this to create new spreadsheets based on a random subset/sample of rows from a spreadsheet.
- After you have imported your data, from the menu select
Spread | Calculate | Random Subset.
Number of samples
Specify the number of random samples to be used. Alternatively, you can provide a percentage of the number of rows to be used by selecting the % option. Note that if Sample with replacement option is selected, then the number of samples must be less than the number or rows in the spreadsheet (or 100 %).
Sample with replacement
When selected, sampling with replacement will be used when forming the subset. That is at each random selection of a row, all the available rows are eligible for selection. If this option is not selected then only the rows that have not been previously selected are eligible for selection.
If a column in the dropdown list is selected, then the values in the selected column will be used in a weighted random sample. The default is the setting, where all rows have equal chance of being selected. Rows with a weight value ≤ 0 will not be included in the random sample.
The Seed option is used to specify an integer value that will be used to start the randomization. If a value of * is given for the seed, a value from the computer’s clock will be used.
Create unique column names
When selected, columns in the spreadsheet will have new names generated for them so that they are unique, otherwise the columns will have the same names as the original spreadsheet.
When selected, rows in the resulting spreadsheet will be sorted into a random order.
Add to book
This lists all open books within Genstat. Select the book that the new sheet will
be added to. If the data are to appear in a new book then select the New Book setting.
To sample without replacement P rows out of N, the following commands could be
SORT [INDEX=GRUNIFORM(N;0;1)] !(1…N); Pos
SUBSET [Pos <= P] X,Y; Sample_X,Sample_Y
To sample with replacement P rows out of N, the following commands could be used:
CALC Row = INT(N*GRUNIFORM(P;0;1)) + 1
VARIATE Sample_X,Sample_Y; (X,Y)$[Row]