Analyses stratified random surveys by expansion or ratio raising (S.D. Langton).
|Controls printed output (
||Controls which high-resolution graphs are plotted (
||Action if x-variable contains missing values (
||Action with restricted (or filtered) observations (
||Stratification factor; default
||Number of influential points to print; default 10|
||Method for ratio analysis (
||Whether to save just the overall summaries instead of those for each stratum (
||Stratum for which the ratio should be set to the combined ratio estimate; default
||Number of rows of plot-matrix; default
||Number of columns of plot-matrix; default
||Number of bootstrap samples to use; default 0|
||Seed for random number generator for bootstrap; default 0|
||The probability level for the confidence intervals; default 0.95|
||Method for forming confidence intervals (
||Whether to produce output in a compact (plaintext) format (
||Base data; if unset expansion raising is used|
||Structure for labelling influential points|
||Numbers of units in each stratum in the population|
||Population totals of the base data in each stratum|
||Saves total estimates|
||Saves standard errors of estimates|
||Saves mean estimates|
||Saves standard errors of mean estimates|
||Saves estimates of ratios|
||Saves fitted values for the observations|
||Saves influence statistics|
||Saves lower confidence limit for total|
||Saves upper confidence limit for total|
||Saves lower confidence limit for mean|
||Saves upper confidence limit for mean|
||Saves residual variances in each stratum|
SVSTRATIFIED analyses the results from a stratified random survey, either by expansion or ratio raising, and allows detection of outliers. The sample data are supplied, in a variate, using the
Y parameter. Similarly the base data are provided using the
X parameter. The
LABELS parameter can supply a variate, factor or text for labelling individual units in the output. If
X is unset or missing, expansion raising is used (i.e. the usual stratified random sampling analysis) but within a stratum units must either all have base data or all lack it. (Note: stratum is used here in the survey sense, not as in the
ANOVA directive: i.e. the units are assumed to be classified into groups, and each group is called a stratum.) If option
XMISSING is set to
fault, any missing base data will cause a fault.
LABELS should usually have one row for each unit in the survey population, with unsampled or non-responding units having a missing value in the Y variate. However, if parameter
NUNITS is set, the
Y variate may contain only the response data;
NUNITS then supplies the information about the number of units in each stratum in the full population. Similarly, if ratio estimation is required,
XTOTALS should contain the population totals of
X in each stratum.
METHOD specifies which method of ratio estimation to use. The setting
separate estimates a ratio for each stratum, whereas settings
classicalcombined assume a common ratio in all strata. The
classicalcombined method follows the approach shown in most textbooks, where the estimate for a stratum is given by ∑
X × ratio where the summation is over all units in the stratum. This approach can produce illogical estimates in some situations (e.g. the estimate may be less than the sum of the responses) and so the
combined method estimates only for the unobserved units and adds this to the sum of the observed responses in the stratum, i.e. ∑
Y + ∑
X × ratio where the summation of
Y is over sampled (or responding) units and the summation of
X is over unsampled units. Option
COMBINEDSTRATUM is used with the separate ratio method and allows the ratio in a particular stratum to be reset to the combined ratio value; this can be a useful technique for dealing with the extreme ratios sometimes produced when the sampling fraction in a stratum is very low.
Printing is controlled via the
influence; these print a summary of the data, estimated totals and influence statistics, respectively. The setting
means produces a table showing the estimated means, whilst
ratio produces a low-resolution plot of the confidence limits for the ratio estimates; this can be useful when deciding whether a combined ratio estimate is to be used. The setting
extra displays extra information relating to the analysis, including sums and means of the response data and raising factors (weights).
CIPROBABILITY option sets the probability level used in calculation of confidence limits for means and totals. The
CIMETHOD option controls how confidence limits are formed after bootstrapping:
percentile uses simple percentiles of the bootstrapped distribution, whilst
tdistribution calculates a standard error from the bootstrapped estimates and then uses the t-distribution to form intervals; the default of
automatic uses the percentile method unless less than 400 bootstrap samples have been made.
NINFLUENCE option controls the number of points of high influence printed. The
COMPACT option can be used to switch to a compact, plain-text style for the output, designed for printing concise summaries of an analysis. When
COMPACT=yes, the information printed depends on the width of the first output channel, with more information being displayed when this can be done without splitting tables.
By default all standard errors and confidence limits are calculated using the conventional approximations. Alternatively, bootstrap methods may be used by setting the
NBOOT option to the required number of bootstrap samples. In the case of ratio estimation, the samples are used to form bootstrap estimates of the ratio, which are then applied to the known population totals for
X. Bootstrapping is carried out independently in each stratum, using the method described by Sarndal et al. (1992, page 442); this involves creating a “pseudopopulation” containing n replicates of each observation, where n is nearest integer to the expansion raising factor (inverse of inclusion probability) for the stratum. Bootstrap samples of the same size as the original sample are then taken from the pseudopopulation and used to compute the estimates. The
SEED option specifies the seed to use in the random number generator used to construct the bootstrap samples. The default value of zero continues an existing sequence of random numbers or, if the generator has not yet been used in this run of Genstat, it initializes the generator automatically.
Graphical output is available by setting the
PLOT option. The setting
single produces a single plot of the response data against
X or against the stratum number if
X is unset. A fitted line is shown if one of the combined ratio methods is used. The
separate setting produces one graph for each stratum, with up to six graphs on each screen. All graphs are plotted on the log scale.
Output can be saved using the parameters
UMEANS. These are generally set to a table classified by the stratification factor but, if option
SAVESUMMARY=yes, then they save scalars containing only the grand total summed over all strata. Ratios can be saved in a table using the
RATIOS parameter, whilst the residual variances in each stratum can be saved using
VARIANCES; the latter are useful for working out optimal allocation strategies for future surveys. Fitted values and influence statistics may be saved using parameters
INFLUENCE. The fitted values are the
X value multiplied by the appropriate ratio for each unit or, where expansion raising is used, the mean
Y value for the stratum.
The methods used are described in most survey analysis textbooks; see for example, Sampford (1962) or Lehtonen & Pahkinen (1994). Most calculations are carried out using Genstat table structures.
The action with
RESTRICT depends of the setting of the
RESTRICTED option. By default restricted units are totally excluded from the analysis. If
RESTRICTED is set to
add, restricted observations are excluded from the ratio calculations but then added back into the total estimates; this is a technique for dealing with nonrepresentative outliers (see e.g. Lee, 1995), which are believed to be genuine observations but are not representative of the wider population.
Lee, H. (1995). Outliers in Business Surveys. Chapter 26 of Business Survey Methods (ed. Cox, Binder, Hinnappa, Christianson, Colledge & Kott). Wiley, New York.
Lehtonen, R. & Pahkinen, E.J. (1994). Practical Methods for Design and Analysis of Complex Surveys. Wiley, New York.
Sampford, M.R. (1962). An introduction to Sampling Theory. Oliver & Boyd, London.
Commands for: Survey analysis.
CAPTION 'SVSTRATIFIED example',\ 'Orkney oats data (Sampford, Table 5.1, page 61).';\ STYLE=meta,plain " Firstly stratified random sample, entered with sample data only, plus table with population size - see Table 6.1, page 73." VARIATE Oats READ Oats 15 20 18 18 23 27 25 60 28 128 69 72 : FACTOR [LEVELS=3; VALUES=4(1,2,3)] Stratum TABLE [CLASS=Stratum; VALUES=12,12,11] N SVSTRATIFIED [PRINT=summary,totals; STRATUMFACTOR=Stratum] Oats; NUNITS=N " Secondly ratio analysis - data entered as one row for each farm in the population - see page 109." VARIATE Oats READ Farm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 : READ Crops 50 50 52 58 60 60 62 65 65 68 71 74 78 90 91 92 96 110 140 140 156 156 190 198 209 240 274 300 303 311 324 330 356 410 430 : READ Oats 17 17 10 16 6 15 20 18 14 20 24 18 23 0 27 34 25 24 43 48 44 45 60 63 70 28 62 59 66 58 128 38 69 72 103 : " To form the sample of 5 farms used, replace the others with missing values." CALCULATE Oats=MVINSERT(Oats; Farm.NI.!(1,15,23,30,33)) SVSTRATIFIED [PRINT=summary,totals,means] Oats; X=Crops