Analyses stratified random surveys by expansion or ratio raising (S.D. Langton).
Options
PRINT = string token |
Controls printed output (summary , totals , means , influence , ratios , extra ); default summ , tota , infl |
---|---|
PLOT = string token |
Controls which high-resolution graphs are plotted (single , separate ); default * i.e. none |
XMISSING = string token |
Action if x-variable contains missing values (estimate , fault ); default esti |
RESTRICTED = string token |
Action with restricted (or filtered) observations (omit , add ); default omit |
STRATUMFACTOR = factor |
Stratification factor; default * i.e. unstratified |
NINFLUENCE = scalar |
Number of influential points to print; default 10 |
METHOD = string token |
Method for ratio analysis (separate , combined , classicalcombined ); default sepa |
SAVESUMMARY = string token |
Whether to save just the overall summaries instead of those for each stratum (yes , no ); default no |
COMBINEDSTRATUM = scalar |
Stratum for which the ratio should be set to the combined ratio estimate; default * |
ROWS = scalars |
Number of rows of plot-matrix; default * i.e. set automatically depending on number of levels of STRATUMFACTOR |
COLUMNS = scalars |
Number of columns of plot-matrix; default * i.e. set automatically depending on number of levels of STRATUMFACTOR |
NBOOT = scalar |
Number of bootstrap samples to use; default 0 |
SEED = scalar |
Seed for random number generator for bootstrap; default 0 |
CIPROBABILITY = scalars |
The probability level for the confidence intervals; default 0.95 |
CIMETHOD = string token |
Method for forming confidence intervals (automatic , tdistribution , percentile ); default auto |
COMPACT = string token |
Whether to produce output in a compact (plaintext) format (yes , no ); default no |
Parameters
Y = variates |
Response data |
---|---|
X = variates |
Base data; if unset expansion raising is used |
LABELS = variates, factors or texts |
Structure for labelling influential points |
NUNITS = tables, scalars or variates |
Numbers of units in each stratum in the population |
XTOTALS = tables, scalars or variates |
Population totals of the base data in each stratum |
TOTALS = tables or scalars |
Saves total estimates |
SETOTALS = tables or scalars |
Saves standard errors of estimates |
MEANS = tables or scalars |
Saves mean estimates |
SEMEANS = tables or scalars |
Saves standard errors of mean estimates |
RATIOS = tables |
Saves estimates of ratios |
FITTEDVALUES = variates |
Saves fitted values for the observations |
INFLUENCE = variates |
Saves influence statistics |
LTOTALS = tables or scalars |
Saves lower confidence limit for total |
UTOTALS = tables or scalars |
Saves upper confidence limit for total |
LMEANS = tables or scalars |
Saves lower confidence limit for mean |
UMEANS = tables or scalars |
Saves upper confidence limit for mean |
VARIANCES = tables or scalars |
Saves residual variances in each stratum |
Description
SVSTRATIFIED
analyses the results from a stratified random survey, either by expansion or ratio raising, and allows detection of outliers. The sample data are supplied, in a variate, using the Y
parameter. Similarly the base data are provided using the X
parameter. The LABELS
parameter can supply a variate, factor or text for labelling individual units in the output. If X
is unset or missing, expansion raising is used (i.e. the usual stratified random sampling analysis) but within a stratum units must either all have base data or all lack it. (Note: stratum is used here in the survey sense, not as in the ANOVA
directive: i.e. the units are assumed to be classified into groups, and each group is called a stratum.) If option XMISSING
is set to fault
, any missing base data will cause a fault.
The vectors Y
, X
and LABELS
should usually have one row for each unit in the survey population, with unsampled or non-responding units having a missing value in the Y variate. However, if parameter NUNITS
is set, the Y
variate may contain only the response data; NUNITS
then supplies the information about the number of units in each stratum in the full population. Similarly, if ratio estimation is required, XTOTALS
should contain the population totals of X
in each stratum.
The METHOD
specifies which method of ratio estimation to use. The setting separate
estimates a ratio for each stratum, whereas settings combined
and classicalcombined
assume a common ratio in all strata. The classicalcombined
method follows the approach shown in most textbooks, where the estimate for a stratum is given by ∑X
× ratio where the summation is over all units in the stratum. This approach can produce illogical estimates in some situations (e.g. the estimate may be less than the sum of the responses) and so the combined
method estimates only for the unobserved units and adds this to the sum of the observed responses in the stratum, i.e. ∑Y
+ ∑X
× ratio where the summation of Y
is over sampled (or responding) units and the summation of X
is over unsampled units. Option COMBINEDSTRATUM
is used with the separate ratio method and allows the ratio in a particular stratum to be reset to the combined ratio value; this can be a useful technique for dealing with the extreme ratios sometimes produced when the sampling fraction in a stratum is very low.
Printing is controlled via the PRINT
option. The default settings are summary
, totals
and influence
; these print a summary of the data, estimated totals and influence statistics, respectively. The setting means
produces a table showing the estimated means, whilst ratio
produces a low-resolution plot of the confidence limits for the ratio estimates; this can be useful when deciding whether a combined ratio estimate is to be used. The setting extra
displays extra information relating to the analysis, including sums and means of the response data and raising factors (weights).
The CIPROBABILITY
option sets the probability level used in calculation of confidence limits for means and totals. The CIMETHOD
option controls how confidence limits are formed after bootstrapping: percentile
uses simple percentiles of the bootstrapped distribution, whilst tdistribution
calculates a standard error from the bootstrapped estimates and then uses the t-distribution to form intervals; the default of automatic
uses the percentile method unless less than 400 bootstrap samples have been made.
The NINFLUENCE
option controls the number of points of high influence printed. The COMPACT
option can be used to switch to a compact, plain-text style for the output, designed for printing concise summaries of an analysis. When COMPACT=yes
, the information printed depends on the width of the first output channel, with more information being displayed when this can be done without splitting tables.
By default all standard errors and confidence limits are calculated using the conventional approximations. Alternatively, bootstrap methods may be used by setting the NBOOT
option to the required number of bootstrap samples. In the case of ratio estimation, the samples are used to form bootstrap estimates of the ratio, which are then applied to the known population totals for X
. Bootstrapping is carried out independently in each stratum, using the method described by Sarndal et al. (1992, page 442); this involves creating a “pseudopopulation” containing n replicates of each observation, where n is nearest integer to the expansion raising factor (inverse of inclusion probability) for the stratum. Bootstrap samples of the same size as the original sample are then taken from the pseudopopulation and used to compute the estimates. The SEED
option specifies the seed to use in the random number generator used to construct the bootstrap samples. The default value of zero continues an existing sequence of random numbers or, if the generator has not yet been used in this run of Genstat, it initializes the generator automatically.
Graphical output is available by setting the PLOT
option. The setting single
produces a single plot of the response data against X
or against the stratum number if X
is unset. A fitted line is shown if one of the combined ratio methods is used. The separate
setting produces one graph for each stratum, with up to six graphs on each screen. All graphs are plotted on the log scale.
Output can be saved using the parameters TOTALS
, SETOTALS
, MEANS
, SEMEANS
, LTOTALS
, UTOTALS
, LMEANS
and UMEANS
. These are generally set to a table classified by the stratification factor but, if option SAVESUMMARY=yes
, then they save scalars containing only the grand total summed over all strata. Ratios can be saved in a table using the RATIOS
parameter, whilst the residual variances in each stratum can be saved using VARIANCES
; the latter are useful for working out optimal allocation strategies for future surveys. Fitted values and influence statistics may be saved using parameters FITTEDVALUES
and INFLUENCE
. The fitted values are the X
value multiplied by the appropriate ratio for each unit or, where expansion raising is used, the mean Y
value for the stratum.
Options: PRINT
, PLOT
, XMISSING
, RESTRICTED
, STRATUMFACTOR
, NINFLUENCE
, METHOD
, SAVESUMMARY
, COMBINEDSTRATUM
, ROWS
, COLUMNS
, NBOOT
, SEED
, CIPROBABILITY
, CIMETHOD
, COMPACT
.
Parameters: Y
, X
, LABELS
, NUNITS
, XTOTALS
, TOTALS
, SETOTALS
, MEANS
, SEMEANS
, RATIOS
, FITTEDVALUES
, INFLUENCE
, LTOTALS
, UTOTALS
, LMEANS
, UMEANS
, VARIANCES
.
Method
The methods used are described in most survey analysis textbooks; see for example, Sampford (1962) or Lehtonen & Pahkinen (1994). Most calculations are carried out using Genstat table structures.
Action with RESTRICT
The action with RESTRICT
depends of the setting of the RESTRICTED
option. By default restricted units are totally excluded from the analysis. If RESTRICTED
is set to add
, restricted observations are excluded from the ratio calculations but then added back into the total estimates; this is a technique for dealing with nonrepresentative outliers (see e.g. Lee, 1995), which are believed to be genuine observations but are not representative of the wider population.
References
Lee, H. (1995). Outliers in Business Surveys. Chapter 26 of Business Survey Methods (ed. Cox, Binder, Hinnappa, Christianson, Colledge & Kott). Wiley, New York.
Lehtonen, R. & Pahkinen, E.J. (1994). Practical Methods for Design and Analysis of Complex Surveys. Wiley, New York.
Sampford, M.R. (1962). An introduction to Sampling Theory. Oliver & Boyd, London.
See also
Procedures: SVBOOT
, SVCALIBRATE
, SVGLM
, SVHOTDECK
, SVREWEIGHT
, SVSAMPLE
, SVTABULATE
, SVWEIGHT
.
Commands for: Survey analysis.
Example
CAPTION 'SVSTRATIFIED example',\ 'Orkney oats data (Sampford, Table 5.1, page 61).';\ STYLE=meta,plain " Firstly stratified random sample, entered with sample data only, plus table with population size - see Table 6.1, page 73." VARIATE Oats READ Oats 15 20 18 18 23 27 25 60 28 128 69 72 : FACTOR [LEVELS=3; VALUES=4(1,2,3)] Stratum TABLE [CLASS=Stratum; VALUES=12,12,11] N SVSTRATIFIED [PRINT=summary,totals; STRATUMFACTOR=Stratum] Oats; NUNITS=N " Secondly ratio analysis - data entered as one row for each farm in the population - see page 109." VARIATE Oats READ Farm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 : READ Crops 50 50 52 58 60 60 62 65 65 68 71 74 78 90 91 92 96 110 140 140 156 156 190 198 209 240 274 300 303 311 324 330 356 410 430 : READ Oats 17 17 10 16 6 15 20 18 14 20 24 18 23 0 27 34 25 24 43 48 44 45 60 63 70 28 62 59 66 58 128 38 69 72 103 : " To form the sample of 5 farms used, replace the others with missing values." CALCULATE Oats=MVINSERT(Oats; Farm.NI.!(1,15,23,30,33)) SVSTRATIFIED [PRINT=summary,totals,means] Oats; X=Crops