Constructs stratified random samples (S.D. Langton).
|Controls printed output (
||Saves the sample, as unit numbers of sampled units when
||Saves the stratification factor|
||Specifies a factor indicating groupings of units for a cluster sample; default
||Numbers of units in the full data set for each level of the
||Numbers, or proportions, of units to sample for each level of the
||Levels for the stratum factor, if it has not already been declared|
||Labels for the stratum factor, if it has not already been declared|
||Whether to number units within each stratum, or across the whole population (
||Seed for the random number generator; default 0 i.e. continue from previous generation|
||Data from the full survey|
||Data for the sample|
SVSAMPLE forms random samples for stratified random surveys. It can also be used to construct a new dataset containing only the sampled units. Groups of units can be sampled together, to allow cluster or multistage sampling.
SVSAMPLE is easiest to use when the vectors (variates, factors or texts) representing all the units in the population have already been created. For a single-stage sample, the number of units to be sampled is then specified by setting the
NSAMPLE option to a table classified by the stratification factor; if the values of
NSAMPLE are all less than 1, these are taken to be proportions to sample.
NSAMPLE can be set to a scalar for unstratified samples. By default, individual units are sampled, but the
CLUSTERS option can supply a factor to define clusters of units that are to be sampled together.
The sample can be saved, in a variate, by the
SAMPLE parameter. If the
METHOD option is set to its default setting of
SAMPLE will contain the numbers of the sampled units. By default, the units are numbered separately within each stratum, but you can set option
NUMBERING=population to number the units across the whole population. Alternatively, if option
SAMPLE variate will have a value for every unit in the population; this stores the value one for the sampled units and zero for the unsampled units. The
STRATUMFACTOR option can save a factor indicating the stratum to which each sampled unit belongs.
Printed output is controlled by the
||to list the sampled units, and|
||to give a summary of the units sampled from each stratum.|
If you already have vectors containing the full data set, you can use
SVSAMPLE to create a new data set containing only the sampled units. The
OLDVECTOR parameter supplies the vectors from the full data set, and the
NEWVECTOR parameter saves vectors with only the samples units. In a stratified survey, you may supply original vectors that have only one value for each stratum. Each corresponding
NEWVECTOR then takes the appropriate values corresponding to the
STRATUMFACTOR levels to which the sampled units belong.
If you do not already have the vectors for the full population, you must supply the information to create them using options of
SVSAMPLE. This is primarily intended for the situation where details of the strata are read into Genstat from a spreadsheet. The
SFLEVELS options define the labels and levels of the
STRATUMFACTOR, respectively. The
NUNITS option specifies the corresponding number of primary sampling units in the full population.
SVSAMPLE can be used to construct multistage samples. In the first stage of sampling,
NSAMPLE should have one value for each stratum. For the next stage of sampling a second
SVSAMPLE command should be given, with
NSAMPLE now having one value for each of the sampling units from the first stage. This process can be repeated, as required, for samples with more than two stages.
NSAMPLE supplies a table, then any restriction on the classifying factor is taken to indicate units to be excluded from the random sampling process. This may be useful, for example, in a social survey where some people have previously indicated that they do not wish to take part in the survey, or in an ecological survey where some sites are inaccessible. Otherwise, any restrictions on the input vectors are ignored.
CAPTION 'SVSAMPLE example'; STYLE=meta " stratified random sampling " VARIATE [VALUES=1...10] Unit; DECIMALS=0 TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] Label FACTOR [LEVELS=2; VALUES=4(1),6(2)] Stratum PRINT Stratum,Unit,Label " sample 2 from Stratum 1, and 3 from stratum 2 " TABLE [CLASSIFICATION=Stratum; VALUES=2,3] ns SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; STRATUM=Sampstrat;\ SAMPLE=Sampno] Unit,Label; NEWVECTOR=Sampunit,Samplabel PRINT Sampstrat,Sampunit,Samplabel,Sampno " two stage example, with simple random sampling at each level " VARIATE [VALUES=1...10] unit; DECIMALS=0 TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] label FACTOR [LEVELS=2; VALUES=4(1),6(2)] stratum FACTOR [LEVELS=5; LABELS=!t(ps1,ps2,ps3,ps4,ps5); VALUES=2(1...5)] psu PRINT stratum,unit,label,psu " 1) sample 1 psu from stratum 1, and 2 from stratum 2 2) sample 1 of the 2 secondary units from each psu sampled at stage 1 " TABLE [CLASSIFICATION=stratum; VALUES=1,2] ns1 TABLE [CLASSIFICATION=psu; VALUES=5(1)] ns2 SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; SAMPLE=sampled1;\ METHOD=population; CLUSTERS=psu] PRINT stratum,unit,psu,sampled1 " prevent sampling at stage 2 in units not sampled in stage 1 " TABULATE [CLASSIFICATION=psu] sampled1; MEAN=tsampled1 CALCULATE ns21 = ns2 * tsampled1 SVSAMPLE [NSAMPLE=ns21; SEED=0; SAMPLE=sampled2; METHOD=population] PRINT stratum,unit,label,psu,sampled1,sampled2