Constructs stratified random samples (S.D. Langton).
Options
PRINT = string token |
Controls printed output (list , summary ); default summ |
---|---|
SAMPLE = variate |
Saves the sample, as unit numbers of sampled units when METHOD=sample , or as a logical (1 or 0) variable indicating sampled or unsampled units when METHOD=population |
STRATUMFACTOR = factor |
Saves the stratification factor |
CLUSTERS = factor |
Specifies a factor indicating groupings of units for a cluster sample; default * i.e. sample individual rows |
NUNITS = table, scalar or variate |
Numbers of units in the full data set for each level of the STRATUMFACTOR |
NSAMPLE = table, scalar or variate |
Numbers, or proportions, of units to sample for each level of the STRATUMFACTOR |
SFLEVELS = variate |
Levels for the stratum factor, if it has not already been declared |
SFLABELS = text |
Labels for the stratum factor, if it has not already been declared |
METHOD = string token |
Whether SAMPLE should contain the numbers of the units sampled from the population, or be a variate with a value for every unit of the full population containing 0 or 1 for unsampled and sampled units respectively (population , sample ); default samp |
NUMBERING = string token |
Whether to number units within each stratum, or across the whole population (withinstratum , population ); default with |
SEED = scalar |
Seed for the random number generator; default 0 i.e. continue from previous generation |
Parameters
OLDVECTOR = variates, factors or texts |
Data from the full survey |
---|---|
NEWVECTOR = variates, factors or texts |
Data for the sample |
Description
SVSAMPLE
forms random samples for stratified random surveys. It can also be used to construct a new dataset containing only the sampled units. Groups of units can be sampled together, to allow cluster or multistage sampling.
SVSAMPLE
is easiest to use when the vectors (variates, factors or texts) representing all the units in the population have already been created. For a single-stage sample, the number of units to be sampled is then specified by setting the NSAMPLE
option to a table classified by the stratification factor; if the values of NSAMPLE
are all less than 1, these are taken to be proportions to sample. NSAMPLE
can be set to a scalar for unstratified samples. By default, individual units are sampled, but the CLUSTERS
option can supply a factor to define clusters of units that are to be sampled together.
The sample can be saved, in a variate, by the SAMPLE
parameter. If the METHOD
option is set to its default setting of sample
, SAMPLE
will contain the numbers of the sampled units. By default, the units are numbered separately within each stratum, but you can set option NUMBERING=population
to number the units across the whole population. Alternatively, if option METHOD=population
, the SAMPLE
variate will have a value for every unit in the population; this stores the value one for the sampled units and zero for the unsampled units. The STRATUMFACTOR
option can save a factor indicating the stratum to which each sampled unit belongs.
Printed output is controlled by the PRINT
option, with settings:
list |
to list the sampled units, and |
---|---|
summary |
to give a summary of the units sampled from each stratum. |
If you already have vectors containing the full data set, you can use SVSAMPLE
to create a new data set containing only the sampled units. The OLDVECTOR
parameter supplies the vectors from the full data set, and the NEWVECTOR
parameter saves vectors with only the samples units. In a stratified survey, you may supply original vectors that have only one value for each stratum. Each corresponding NEWVECTOR
then takes the appropriate values corresponding to the STRATUMFACTOR
levels to which the sampled units belong.
If you do not already have the vectors for the full population, you must supply the information to create them using options of SVSAMPLE
. This is primarily intended for the situation where details of the strata are read into Genstat from a spreadsheet. The SFLABELS
and SFLEVELS
options define the labels and levels of the STRATUMFACTOR
, respectively. The NUNITS
option specifies the corresponding number of primary sampling units in the full population.
SVSAMPLE
can be used to construct multistage samples. In the first stage of sampling, NSAMPLE
should have one value for each stratum. For the next stage of sampling a second SVSAMPLE
command should be given, with NSAMPLE
now having one value for each of the sampling units from the first stage. This process can be repeated, as required, for samples with more than two stages.
Options: PRINT
, SAMPLE
, STRATUMFACTOR
, CLUSTERS
, NUNITS
, NSAMPLE
, SFLEVELS
, SFLABELS
, METHOD
, NUMBERING
, SEED
.
Parameters: OLDVECTOR
, NEWVECTOR
.
Action with RESTRICT
If NSAMPLE
supplies a table, then any restriction on the classifying factor is taken to indicate units to be excluded from the random sampling process. This may be useful, for example, in a social survey where some people have previously indicated that they do not wish to take part in the survey, or in an ecological survey where some sites are inaccessible. Otherwise, any restrictions on the input vectors are ignored.
See also
Procedures: SVBOOT
, SVCALIBRATE
, SVGLM
, SVHOTDECK
, SVREWEIGHT
, SVSTRATIFIED
, SVTABULATE
, SVWEIGHT
, SAMPLE
.
Commands for: Calculations and manipulation, Survey analysis.
Example
CAPTION 'SVSAMPLE example'; STYLE=meta " stratified random sampling " VARIATE [VALUES=1...10] Unit; DECIMALS=0 TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] Label FACTOR [LEVELS=2; VALUES=4(1),6(2)] Stratum PRINT Stratum,Unit,Label " sample 2 from Stratum 1, and 3 from stratum 2 " TABLE [CLASSIFICATION=Stratum; VALUES=2,3] ns SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; STRATUM=Sampstrat;\ SAMPLE=Sampno] Unit,Label; NEWVECTOR=Sampunit,Samplabel PRINT Sampstrat,Sampunit,Samplabel,Sampno " two stage example, with simple random sampling at each level " VARIATE [VALUES=1...10] unit; DECIMALS=0 TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] label FACTOR [LEVELS=2; VALUES=4(1),6(2)] stratum FACTOR [LEVELS=5; LABELS=!t(ps1,ps2,ps3,ps4,ps5); VALUES=2(1...5)] psu PRINT stratum,unit,label,psu " 1) sample 1 psu from stratum 1, and 2 from stratum 2 2) sample 1 of the 2 secondary units from each psu sampled at stage 1 " TABLE [CLASSIFICATION=stratum; VALUES=1,2] ns1 TABLE [CLASSIFICATION=psu; VALUES=5(1)] ns2 SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; SAMPLE=sampled1;\ METHOD=population; CLUSTERS=psu] PRINT stratum,unit,psu,sampled1 " prevent sampling at stage 2 in units not sampled in stage 1 " TABULATE [CLASSIFICATION=psu] sampled1; MEAN=tsampled1 CALCULATE ns21 = ns2 * tsampled1 SVSAMPLE [NSAMPLE=ns21; SEED=0; SAMPLE=sampled2; METHOD=population] PRINT stratum,unit,label,psu,sampled1,sampled2