Long page
descriptions

Chapter 1   Introduction

1.1   Surveys and experiments

1.1.1   Surveys and other samples

In observational studies such as surveys, data are sampled from populations to describe the characteristics of the populations.

1.1.2   Causal relationships

Causal relationships between variables are particularly important since they predict the effect of changing one variable on the other.

1.1.3   Observational data and relationships

Causal relationships cannot be inferred from observational data since observed relationships may result from the influence of further unrecorded variables.

1.1.4   Experiments

In experiments, the values of one variable are under the control of the researcher.

1.1.5   Experiments and causal relationships

A well-designed experiment gives the researcher information about how changes to the controlled variable affect the response.

1.2   Practical issues in design

1.2.1   Problem and objectives

The problem being addressed by a project and its specific objectives should be clearly stated before data are collected. The researcher should also consider how the project results will be used.

1.2.2   Experiment or survey?

Depending on the objectives of a project, it may be appropriate to conduct an experiment, a survey or both.

1.2.3   Measurements

In experiments and surveys, data are collected from discrete units. Decisions about which measurements should be made from each unit must be guided by the objectives of the study.

1.2.4   Difficulties with human subjects

When experiments and surveys are conducted with people rather than other types of unit, various practical issues complicate the design.

1.3   Principles of experimental design

1.3.1   Experiments and treatments

Experiments are conducted to assess the effect of some categorical or numerical explanatory variable on a response. Experiments are characterised by the fact that the researcher can control the values of the explanatory variable that are used.

1.3.2   Variable experimental units

In many experiments, the units on which the experiment is conducted are not identical. The varying characteristics of the experimental units may affect the response.

1.3.3   A badly designed experiment

If the treatments are badly allocated to experimental units, the experiment may over- or under-estimate their effect.

1.3.4   Confounding

If the experimental units that are chosen to get one treatment also differ in other ways, it will be impossible to tell whether it is the treatment or the other distinct characteristics of the units that affects the response.

1.3.5   Randomisation

Random allocation of treatments to experimental units avoids systematic over- or under-estimation of treatment effects.

1.3.6   Replication

If each treatment is used in more than one experimental unit, unit-to-unit variability can be assessed. This gives information about whether differences between the treatments are more than chance differences.

1.3.7   Blocking

By grouping similar experimental units into blocks and randomly allocating treatments within blocks, the treatment effects can be estimated more accurately.

1.4   Modelling variation

1.4.1   Explained and unexplained variation

Varying the values of the controlled factors causes some variability in the response -- explained variation. In some kinds of experiment, there is known structure to the experimental units that also results in explained variation. Other variation remains unexplained.

1.4.2   Modelling variation

Explained variation in the response is modelled with a function for the response mean that involves the levels of the factors (constants) and some unknown parameters. Unexplained variation is modelled with a probability distribution.

1.4.3   Normal model for the response

In normal models, the response is the sum of a component that depends on on the experimental factors and structure of the experimental units, and an error term that a normal distribution with mean zero and constant standard deviation.

1.4.4   Parameter estimation

The model for the explained variation usually involves unknown parameters. These can be estimated by least squares.