SVSAMPLE procedure

Constructs stratified random samples (S.D. Langton).

Options

`PRINT` = string token	Controls printed output (`list`, `summary`); default `summ`
`SAMPLE` = variate	Saves the sample, as unit numbers of sampled units when `METHOD=sample`, or as a logical (1 or 0) variable indicating sampled or unsampled units when `METHOD=population`
`STRATUMFACTOR` = factor	Saves the stratification factor
`CLUSTERS` = factor	Specifies a factor indicating groupings of units for a cluster sample; default `*` i.e. sample individual rows
`NUNITS` = table, scalar or variate	Numbers of units in the full data set for each level of the `STRATUMFACTOR`
`NSAMPLE` = table, scalar or variate	Numbers, or proportions, of units to sample for each level of the `STRATUMFACTOR`
`SFLEVELS` = variate	Levels for the stratum factor, if it has not already been declared
`SFLABELS` = text	Labels for the stratum factor, if it has not already been declared
`METHOD` = string token	Whether `SAMPLE` should contain the numbers of the units sampled from the population, or be a variate with a value for every unit of the full population containing 0 or 1 for unsampled and sampled units respectively (`population`, `sample`); default `samp`
`NUMBERING` = string token	Whether to number units within each stratum, or across the whole population (`withinstratum`, `population`); default `with`
`SEED` = scalar	Seed for the random number generator; default 0 i.e. continue from previous generation

Parameters

`OLDVECTOR` = variates, factors or texts	Data from the full survey
`NEWVECTOR` = variates, factors or texts	Data for the sample

Description

SVSAMPLE forms random samples for stratified random surveys. It can also be used to construct a new dataset containing only the sampled units. Groups of units can be sampled together, to allow cluster or multistage sampling.

SVSAMPLE is easiest to use when the vectors (variates, factors or texts) representing all the units in the population have already been created. For a single-stage sample, the number of units to be sampled is then specified by setting the NSAMPLE option to a table classified by the stratification factor; if the values of NSAMPLE are all less than 1, these are taken to be proportions to sample. NSAMPLE can be set to a scalar for unstratified samples. By default, individual units are sampled, but the CLUSTERS option can supply a factor to define clusters of units that are to be sampled together.

The sample can be saved, in a variate, by the SAMPLE parameter. If the METHOD option is set to its default setting of sample, SAMPLE will contain the numbers of the sampled units. By default, the units are numbered separately within each stratum, but you can set option NUMBERING=population to number the units across the whole population. Alternatively, if option METHOD=population, the SAMPLE variate will have a value for every unit in the population; this stores the value one for the sampled units and zero for the unsampled units. The STRATUMFACTOR option can save a factor indicating the stratum to which each sampled unit belongs.

Printed output is controlled by the PRINT option, with settings:

`list`	to list the sampled units, and
`summary`	to give a summary of the units sampled from each stratum.

If you already have vectors containing the full data set, you can use SVSAMPLE to create a new data set containing only the sampled units. The OLDVECTOR parameter supplies the vectors from the full data set, and the NEWVECTOR parameter saves vectors with only the samples units. In a stratified survey, you may supply original vectors that have only one value for each stratum. Each corresponding NEWVECTOR then takes the appropriate values corresponding to the STRATUMFACTOR levels to which the sampled units belong.

If you do not already have the vectors for the full population, you must supply the information to create them using options of SVSAMPLE. This is primarily intended for the situation where details of the strata are read into Genstat from a spreadsheet. The SFLABELS and SFLEVELS options define the labels and levels of the STRATUMFACTOR, respectively. The NUNITS option specifies the corresponding number of primary sampling units in the full population.

SVSAMPLE can be used to construct multistage samples. In the first stage of sampling, NSAMPLE should have one value for each stratum. For the next stage of sampling a second SVSAMPLE command should be given, with NSAMPLE now having one value for each of the sampling units from the first stage. This process can be repeated, as required, for samples with more than two stages.

Options: PRINT, SAMPLE, STRATUMFACTOR, CLUSTERS, NUNITS, NSAMPLE, SFLEVELS, SFLABELS, METHOD, NUMBERING, SEED.

Parameters: OLDVECTOR, NEWVECTOR.

Action with `RESTRICT`

If NSAMPLE supplies a table, then any restriction on the classifying factor is taken to indicate units to be excluded from the random sampling process. This may be useful, for example, in a social survey where some people have previously indicated that they do not wish to take part in the survey, or in an ecological survey where some sites are inaccessible. Otherwise, any restrictions on the input vectors are ignored.

Example

CAPTION   'SVSAMPLE example'; STYLE=meta
" stratified random sampling "
VARIATE   [VALUES=1...10] Unit; DECIMALS=0
TEXT      [VALUES=a,b,c,d,e,f,g,h,i,j] Label
FACTOR    [LEVELS=2; VALUES=4(1),6(2)] Stratum
PRINT     Stratum,Unit,Label
" sample 2 from Stratum 1, and 3 from stratum 2 "
TABLE     [CLASSIFICATION=Stratum; VALUES=2,3] ns
SVSAMPLE  [PRINT=summary,list; NSAMPLE=ns; SEED=5642; STRATUM=Sampstrat;\
          SAMPLE=Sampno] Unit,Label; NEWVECTOR=Sampunit,Samplabel
PRINT     Sampstrat,Sampunit,Samplabel,Sampno

" two stage example, with simple random sampling at each level "
VARIATE   [VALUES=1...10] unit; DECIMALS=0
TEXT      [VALUES=a,b,c,d,e,f,g,h,i,j] label
FACTOR    [LEVELS=2; VALUES=4(1),6(2)] stratum
FACTOR    [LEVELS=5; LABELS=!t(ps1,ps2,ps3,ps4,ps5); VALUES=2(1...5)] psu
PRINT     stratum,unit,label,psu
" 1) sample 1 psu from stratum 1, and 2 from stratum 2
  2) sample 1 of the 2 secondary units from each psu sampled at stage 1 "
TABLE     [CLASSIFICATION=stratum; VALUES=1,2] ns1
TABLE     [CLASSIFICATION=psu; VALUES=5(1)] ns2
SVSAMPLE  [PRINT=summary,list; NSAMPLE=ns; SEED=5642; SAMPLE=sampled1;\
          METHOD=population; CLUSTERS=psu]
PRINT     stratum,unit,psu,sampled1
" prevent sampling at stage 2 in units not sampled in stage 1 "
TABULATE  [CLASSIFICATION=psu] sampled1; MEAN=tsampled1
CALCULATE ns21 = ns2 * tsampled1  
SVSAMPLE  [NSAMPLE=ns21; SEED=0; SAMPLE=sampled2; METHOD=population] 
PRINT     stratum,unit,label,psu,sampled1,sampled2

Updated on March 5, 2019

Was this article helpful?

Yes No