Exploratory data analysis

The initial chapters of this e-book describe statistical methods to explore and summarise a data set. Appropriate methods depend on the structure of the data set — the number and types of its variables.

One numerical variable
Chapter 2 describes graphical and numerical summaries of a single numerical variable.
Two or more numerical variables
Chapter 3 examines the relationship between two or more numerical variables.
One time-ordered numerical variable
In Chapter 4, we examine statistical methods for ordered data (time series).
Categorical variables
In Chapter 5, graphical and numerical methods are presented for categorical variables.
Many variables
Chapter 6 is a brief chapter containing a few methods for data sets with 3 or more variables.

Data collection

Statisticians should be involved before any data are collected. Statistical principles can be applied to the data collection process that ensure that the resulting data can be meaningfully analysed. Chapters 7 and 8 explore the idea of random sampling and describe some principles that should be followed in data collection.

Inference

To fully understand the information that is contained in most data sets, we must take account of randomness — if we collected the data again by repeating an experiment or collecting data from different people, the values would often be different. The relevant statistical methods are collectively called inference.

Again, the details of the statistical analysis depend mostly on the structure of the data set — the number and types of its variables.

One variable
Chapters 9 and 10 develop the methodology to answer questions about a single numerical or categorical variable (confidence intervals and hypothesis tests).
Two or more groups
Chapter 11 present confidence intervals and hypothesis tests to compare two or more groups.
Two numerical variables
Chapter 12 models the relationship between two numerical variables and gives confidence intervals and hypothesis tests about these models.
Two categorical variables
Chapter 13 describes methodology to compare groups of categorical data or to examine the relationship between two categorical variables (contingency tables).