Constructs stratified random samples (S.D. Langton).
Options
PRINT = string token |
Controls printed output (list, summary); default summ |
|---|---|
SAMPLE = variate |
Saves the sample, as unit numbers of sampled units when METHOD=sample, or as a logical (1 or 0) variable indicating sampled or unsampled units when METHOD=population |
STRATUMFACTOR = factor |
Saves the stratification factor |
CLUSTERS = factor |
Specifies a factor indicating groupings of units for a cluster sample; default * i.e. sample individual rows |
NUNITS = table, scalar or variate |
Numbers of units in the full data set for each level of the STRATUMFACTOR |
NSAMPLE = table, scalar or variate |
Numbers, or proportions, of units to sample for each level of the STRATUMFACTOR |
SFLEVELS = variate |
Levels for the stratum factor, if it has not already been declared |
SFLABELS = text |
Labels for the stratum factor, if it has not already been declared |
METHOD = string token |
Whether SAMPLE should contain the numbers of the units sampled from the population, or be a variate with a value for every unit of the full population containing 0 or 1 for unsampled and sampled units respectively (population, sample); default samp |
NUMBERING = string token |
Whether to number units within each stratum, or across the whole population (withinstratum, population); default with |
SEED = scalar |
Seed for the random number generator; default 0 i.e. continue from previous generation |
Parameters
OLDVECTOR = variates, factors or texts |
Data from the full survey |
|---|---|
NEWVECTOR = variates, factors or texts |
Data for the sample |
Description
SVSAMPLE forms random samples for stratified random surveys. It can also be used to construct a new dataset containing only the sampled units. Groups of units can be sampled together, to allow cluster or multistage sampling.
SVSAMPLE is easiest to use when the vectors (variates, factors or texts) representing all the units in the population have already been created. For a single-stage sample, the number of units to be sampled is then specified by setting the NSAMPLE option to a table classified by the stratification factor; if the values of NSAMPLE are all less than 1, these are taken to be proportions to sample. NSAMPLE can be set to a scalar for unstratified samples. By default, individual units are sampled, but the CLUSTERS option can supply a factor to define clusters of units that are to be sampled together.
The sample can be saved, in a variate, by the SAMPLE parameter. If the METHOD option is set to its default setting of sample, SAMPLE will contain the numbers of the sampled units. By default, the units are numbered separately within each stratum, but you can set option NUMBERING=population to number the units across the whole population. Alternatively, if option METHOD=population, the SAMPLE variate will have a value for every unit in the population; this stores the value one for the sampled units and zero for the unsampled units. The STRATUMFACTOR option can save a factor indicating the stratum to which each sampled unit belongs.
Printed output is controlled by the PRINT option, with settings:
list |
to list the sampled units, and |
|---|---|
summary |
to give a summary of the units sampled from each stratum. |
If you already have vectors containing the full data set, you can use SVSAMPLE to create a new data set containing only the sampled units. The OLDVECTOR parameter supplies the vectors from the full data set, and the NEWVECTOR parameter saves vectors with only the samples units. In a stratified survey, you may supply original vectors that have only one value for each stratum. Each corresponding NEWVECTOR then takes the appropriate values corresponding to the STRATUMFACTOR levels to which the sampled units belong.
If you do not already have the vectors for the full population, you must supply the information to create them using options of SVSAMPLE. This is primarily intended for the situation where details of the strata are read into Genstat from a spreadsheet. The SFLABELS and SFLEVELS options define the labels and levels of the STRATUMFACTOR, respectively. The NUNITS option specifies the corresponding number of primary sampling units in the full population.
SVSAMPLE can be used to construct multistage samples. In the first stage of sampling, NSAMPLE should have one value for each stratum. For the next stage of sampling a second SVSAMPLE command should be given, with NSAMPLE now having one value for each of the sampling units from the first stage. This process can be repeated, as required, for samples with more than two stages.
Options: PRINT, SAMPLE, STRATUMFACTOR, CLUSTERS, NUNITS, NSAMPLE, SFLEVELS, SFLABELS, METHOD, NUMBERING, SEED.
Parameters: OLDVECTOR, NEWVECTOR.
Action with RESTRICT
If NSAMPLE supplies a table, then any restriction on the classifying factor is taken to indicate units to be excluded from the random sampling process. This may be useful, for example, in a social survey where some people have previously indicated that they do not wish to take part in the survey, or in an ecological survey where some sites are inaccessible. Otherwise, any restrictions on the input vectors are ignored.
See also
Procedures: SVBOOT, SVCALIBRATE, SVGLM, SVHOTDECK, SVREWEIGHT, SVSTRATIFIED, SVTABULATE, SVWEIGHT, SAMPLE.
Commands for: Calculations and manipulation, Survey analysis.
Example
CAPTION 'SVSAMPLE example'; STYLE=meta
" stratified random sampling "
VARIATE [VALUES=1...10] Unit; DECIMALS=0
TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] Label
FACTOR [LEVELS=2; VALUES=4(1),6(2)] Stratum
PRINT Stratum,Unit,Label
" sample 2 from Stratum 1, and 3 from stratum 2 "
TABLE [CLASSIFICATION=Stratum; VALUES=2,3] ns
SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; STRATUM=Sampstrat;\
SAMPLE=Sampno] Unit,Label; NEWVECTOR=Sampunit,Samplabel
PRINT Sampstrat,Sampunit,Samplabel,Sampno
" two stage example, with simple random sampling at each level "
VARIATE [VALUES=1...10] unit; DECIMALS=0
TEXT [VALUES=a,b,c,d,e,f,g,h,i,j] label
FACTOR [LEVELS=2; VALUES=4(1),6(2)] stratum
FACTOR [LEVELS=5; LABELS=!t(ps1,ps2,ps3,ps4,ps5); VALUES=2(1...5)] psu
PRINT stratum,unit,label,psu
" 1) sample 1 psu from stratum 1, and 2 from stratum 2
2) sample 1 of the 2 secondary units from each psu sampled at stage 1 "
TABLE [CLASSIFICATION=stratum; VALUES=1,2] ns1
TABLE [CLASSIFICATION=psu; VALUES=5(1)] ns2
SVSAMPLE [PRINT=summary,list; NSAMPLE=ns; SEED=5642; SAMPLE=sampled1;\
METHOD=population; CLUSTERS=psu]
PRINT stratum,unit,psu,sampled1
" prevent sampling at stage 2 in units not sampled in stage 1 "
TABULATE [CLASSIFICATION=psu] sampled1; MEAN=tsampled1
CALCULATE ns21 = ns2 * tsampled1
SVSAMPLE [NSAMPLE=ns21; SEED=0; SAMPLE=sampled2; METHOD=population]
PRINT stratum,unit,label,psu,sampled1,sampled2