# SAMPLE procedure

Samples from a set of units, possibly stratified by factors (P.W. Lane).

### Options

`SEED` = scalar Seed for the random number generator; default 0 i.e. continue from previous generation Number of units from which a simple sample is to be taken; default `*` i.e. as defined by `UNITS` statement

### Parameters

`NSAMPLE` = scalars or tables Number of values in simple sample, or table of numbers of values at each combination of levels of its classifying factors; no default Structure to store the result; no default

### Description

Procedure `SAMPLE` produces a random sample from a set of units. A simple sample can be obtained by setting the `NSAMPLE` parameter to the required number in the sample, and the `NVALUES` option to the number of units in the set. The `NVALUES` option can be omitted if the required number of units has been defined by a `UNITS` statement earlier in the job.

For a stratified sample, the `NSAMPLE` option should be set to a table containing the required number of units to be sampled at each combination of levels of the factors classifying the table. The `NVALUES` option is not then relevant as the set of units is determined by the values of the classifying factors.

The `SAMPLE` parameter must be set to an identifier, which will be formed into a variate containing a set of `NSAMPLE` integers in the range (1…`NVALUES`), obtained by random sampling without replacement. The `SEED` option can be set to define a starting value for the random numbers used to select the units. This can be omitted if some random numbers have already been generated during the current job; `SAMPLE` will then take the numbers that continue the previous sequence.

### Method

For a simple sample, a full set of units (1…`NVALUES`) is randomly ordered and the first `NSAMPLE` values are taken. For a stratified sample, the units are sorted according to levels of the classifying factors (after random ordering) and then the requested number of values are taken for each combination of levels.

### Action with `RESTRICT`

The factors classifying the table must not be restricted. The procedure cannot be used on a restricted set of units.

### Example

```CAPTION  'SAMPLE example',\
'1) select a random sample of 10 out of 100 units;'; STYLE=meta,plain
SAMPLE   [SEED=55326; NVALUES=100] NSAMPLE=10; SAMPLE=Selected
PRINT    Selected; DECIMALS=0
CAPTION  !t('2) select specified numbers of units at each combination',\
'of levels of two factors.')
FACTOR   [LEVELS=3; VALUES=12(1...3)] F1
&        [LEVELS=2; VALUES=6(1,2)3] F2
TABLE    [CLASSIFICATION=F1,F2; VALUES=(1,2)3] Numbers
SAMPLE   NSAMPLE=Numbers; SAMPLE=Chosen
CAPTION  'Show which units and factor combinations have been selected.'
VARIATE  [VALUES=1...36] Unit
RESTRICT Unit,F1,F2; EXPAND(Chosen; 36)
PRINT    Unit,F1,F2; DECIMALS=0
CAPTION  'Demonstrate that the correct numbers of units have been chosen.'
TABULATE [CLASSIFICATION=F1,F2; COUNT=Check]
PRINT    Numbers,Check; DECIMALS=0
```
