Constructs stratified random samples (S.D. Langton).

### Options

`PRINT` = string token |
Controls printed output (`list` , `summary` ); default `summ` |
---|---|

`SAMPLE` = variate |
Saves the sample, as unit numbers of sampled units when `METHOD=sample` , or as a logical (1 or 0) variable indicating sampled or unsampled units when `METHOD=population` |

`STRATUMFACTOR` = factor |
Saves the stratification factor |

`CLUSTERS` = factor |
Specifies a factor indicating groupings of units for a cluster sample; default `*` i.e. sample individual rows |

`NUNITS` = table, scalar or variate |
Numbers of units in the full data set for each level of the `STRATUMFACTOR` |

`NSAMPLE` = table, scalar or variate |
Numbers, or proportions, of units to sample for each level of the `STRATUMFACTOR` |

`SFLEVELS` = variate |
Levels for the stratum factor, if it has not already been declared |

`SFLABELS` = text |
Labels for the stratum factor, if it has not already been declared |

`METHOD` = string token |
Whether `SAMPLE` should contain the numbers of the units sampled from the population, or be a variate with a value for every unit of the full population containing 0 or 1 for unsampled and sampled units respectively (`population` , `sample` ); default `samp` |

`NUMBERING` = string token |
Whether to number units within each stratum, or across the whole population (`withinstratum` , `population` ); default `with` |

`SEED` = scalar |
Seed for the random number generator; default 0 i.e. continue from previous generation |

### Parameters

`OLDVECTOR` = variates, factors or texts |
Data from the full survey |
---|---|

`NEWVECTOR` = variates, factors or texts |
Data for the sample |

### Description

`SVSAMPLE`

forms random samples for stratified random surveys. It can also be used to construct a new dataset containing only the sampled units. Groups of units can be sampled together, to allow cluster or multistage sampling.

`SVSAMPLE`

is easiest to use when the vectors (variates, factors or texts) representing all the units in the population have already been created. For a single-stage sample, the number of units to be sampled is then specified by setting the `NSAMPLE`

option to a table classified by the stratification factor; if the values of `NSAMPLE`

are all less than 1, these are taken to be proportions to sample. `NSAMPLE`

can be set to a scalar for unstratified samples. By default, individual units are sampled, but the `CLUSTERS`

option can supply a factor to define clusters of units that are to be sampled together.

The sample can be saved, in a variate, by the `SAMPLE`

parameter. If the `METHOD`

option is set to its default setting of `sample`

, `SAMPLE`

will contain the numbers of the sampled units. By default, the units are numbered separately within each stratum, but you can set option `NUMBERING=population`

to number the units across the whole population. Alternatively, if option `METHOD=population`

, the `SAMPLE`

variate will have a value for every unit in the population; this stores the value one for the sampled units and zero for the unsampled units. The `STRATUMFACTOR`

option can save a factor indicating the stratum to which each sampled unit belongs.

Printed output is controlled by the `PRINT`

option, with settings:

`list` |
to list the sampled units, and |
---|---|

`summary` |
to give a summary of the units sampled from each stratum. |

If you already have vectors containing the full data set, you can use `SVSAMPLE`

to create a new data set containing only the sampled units. The `OLDVECTOR`

parameter supplies the vectors from the full data set, and the `NEWVECTOR`

parameter saves vectors with only the samples units. In a stratified survey, you may supply original vectors that have only one value for each stratum. Each corresponding `NEWVECTOR`

then takes the appropriate values corresponding to the `STRATUMFACTOR`

levels to which the sampled units belong.

If you do not already have the vectors for the full population, you must supply the information to create them using options of `SVSAMPLE`

. This is primarily intended for the situation where details of the strata are read into Genstat from a spreadsheet. The `SFLABELS`

and `SFLEVELS`

options define the labels and levels of the `STRATUMFACTOR`

, respectively. The `NUNITS`

option specifies the corresponding number of primary sampling units in the full population.

`SVSAMPLE`

can be used to construct multistage samples. In the first stage of sampling, `NSAMPLE`

should have one value for each stratum. For the next stage of sampling a second `SVSAMPLE`

command should be given, with `NSAMPLE`

now having one value for each of the sampling units from the first stage. This process can be repeated, as required, for samples with more than two stages.

Options: `PRINT`

, `SAMPLE`

, `STRATUMFACTOR`

, `CLUSTERS`

, `NUNITS`

, `NSAMPLE`

, `SFLEVELS`

, `SFLABELS`

, `METHOD`

, `NUMBERING`

, `SEED`

.

Parameters: `OLDVECTOR`

, `NEWVECTOR`

.

### Action with `RESTRICT`

If `NSAMPLE`

supplies a table, then any restriction on the classifying factor is taken to indicate units to be excluded from the random sampling process. This may be useful, for example, in a social survey where some people have previously indicated that they do not wish to take part in the survey, or in an ecological survey where some sites are inaccessible. Otherwise, any restrictions on the input vectors are ignored.

### See also

Procedures: `SVBOOT`

, `SVCALIBRATE`

, `SVGLM`

, `SVHOTDECK`

, `SVREWEIGHT`

, `SVSTRATIFIED`

, `SVTABULATE`

, `SVWEIGHT`

, `SAMPLE`

.

Commands for: Calculations and manipulation, Survey analysis.

### Example

CAPTION 'SVSAMPLE example'; STYLE=meta " stratified random sampling " VARIATE [VALUES=1...10] Unit; DECIMALS=0 TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] Label FACTOR [LEVELS=2; VALUES=4(1),6(2)] Stratum PRINT Stratum,Unit,Label " sample 2 from Stratum 1, and 3 from stratum 2 " TABLE [CLASSIFICATION=Stratum; VALUES=2,3] ns SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; STRATUM=Sampstrat;\ SAMPLE=Sampno] Unit,Label; NEWVECTOR=Sampunit,Samplabel PRINT Sampstrat,Sampunit,Samplabel,Sampno " two stage example, with simple random sampling at each level " VARIATE [VALUES=1...10] unit; DECIMALS=0 TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] label FACTOR [LEVELS=2; VALUES=4(1),6(2)] stratum FACTOR [LEVELS=5; LABELS=!t(ps1,ps2,ps3,ps4,ps5); VALUES=2(1...5)] psu PRINT stratum,unit,label,psu " 1) sample 1 psu from stratum 1, and 2 from stratum 2 2) sample 1 of the 2 secondary units from each psu sampled at stage 1 " TABLE [CLASSIFICATION=stratum; VALUES=1,2] ns1 TABLE [CLASSIFICATION=psu; VALUES=5(1)] ns2 SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; SAMPLE=sampled1;\ METHOD=population; CLUSTERS=psu] PRINT stratum,unit,psu,sampled1 " prevent sampling at stage 2 in units not sampled in stage 1 " TABULATE [CLASSIFICATION=psu] sampled1; MEAN=tsampled1 CALCULATE ns21 = ns2 * tsampled1 SVSAMPLE [NSAMPLE=ns21; SEED=0; SAMPLE=sampled2; METHOD=population] PRINT stratum,unit,label,psu,sampled1,sampled2